NCBI Logo
GEO Logo
   NCBI > GEO > Accession DisplayHelp Not logged in | LoginHelp
GEO help: Mouse over screen elements for information.
          Go
Sample GSM1537132 Query DataSets for GSM1537132
Status Public on Dec 11, 2014
Title single cell whole genome sequencing data from E14 embryonic stem cell 2 obtained with the MALBAC protocol
Sample type SRA
 
Source name Mouse Embryonic Stem Cell line
Organism Mus musculus
Characteristics cell line: E14
cell type: Mouse Embryonic Stem Cell line
Growth protocol E14 cells were cultured in DMEM (Gibco) supplemented with 15% FBS (Gibco), 2mM GlutaMax (Gibco), 0.1 mM MEM nonessential amino acids, 0.1 mM β-mercaptoethanol (Sigma), 1% Pen/Strep (Gibco) and 1000u LIF/ml (ESGRO) on gelatinized petri dishes. SKBR3 cells were cultured in McCoy's 5a Medium Modified (ATCC) with 10% FBS and 1% Pen/Strep. Cells were grown at 37C and 5% CO2.
Extracted molecule genomic DNA
Extraction protocol MALBAC (Zong et al. Science 2012)
Trypsinized single cells were picked using a mouth pipet with a 30μm glass capillary under a stereomicroscope. Picked cells were deposited in the center of the lid of a 0.2ml PCR tube and snap frozen in liquid nitrogen.
 
Library strategy OTHER
Library source genomic
Library selection other
Instrument model Illumina HiSeq 2500
 
Description Coverage data for 50kb bins
E14OMsc2
Data processing For bulk gDNA and MALBAC libraries, paired end sequencing reads were aligned to the genome release mm10 for mouse cells (E14) and to the genome release hg19 for human cells (SK-BR-3) using BWA with default parameters. For the single cell gDNA libraries processed by DR-Seq, paired end sequencing reads were aligned to a masked genome mm10 for mouse cells and to a masked genome hg19 for human cells using BWA with default parameters. The masked genomes mm10 and hg19 were created by replacing all the coding sequences within the genome with "N". This is because the fraction used to sequence gDNA contains sequences that could originate from the cDNA within coding regions. By masking the coding sequences within the genome, such ambiguous reads that might arise from either gDNA or cDNA are discarded computationally leaving only reads that arise from gDNA. This does not pose a problem for calling copy number variations since the coding region constitutes only approximately 2% of the genome.
All PCR duplicates within mapped reads from the bulk, MALBAC or DR-Seq libraries are removed. As the first step towards quantifying the gDNA data, the genome is divided into bins. To account for the masking of the genome in the DR-Seq data, the start and end coordinates of each bin are chosen such that the length of all bins are the same after excluding coding regions within each bin. Next, to further reduce amplification biases, we developed a coverage-based method to quantify the reads within bins. This coverage-based method significantly reduces bin-to-bin technical noise (see supplementary document). The reads are then corrected for GC bias. The corrected read distribution is then used to identify breakpoints using the circular binary segmentation (CBS) algorithm26. Finally, the median read counts for each segment are used to call copy number variations in single cells.
For bulk mRNA and CEL-Seq libraries, paired end sequencing reads were aligned to the transcriptome using Burrows-Wheeler Aligner (BWA) with default parameters. For single cell mRNA processed using DR-Seq, the Ad-2 adapter sequence was trimmed computationally from the right mate and then aligned to the transcriptome using BWA with default parameters. For the E14 cells, we used the RefSeq gene models based on the mouse genome release mm10. For the SK-BR-3 cells, we used the RefSeq gene models based on the human genome release hg19. For bulk mRNA sequencing both mates of each read were mapped to the transcriptome. For CEL-Seq and DR-Seq, the right mate of each read pair was mapped to the transcriptome and the ERCC spike-ins. The left mate was used to identify the cell from which the transcript came based on the cell-specific barcode. Reads mapping to more than one region were distributed uniformly.
For the bulk mRNA sequencing libraries, PCR duplicates were then removed to obtain the dataset used in all the analysis. The left mate of the CEL-Seq libraries also contained a 4-bp random sequence, introduced during reverse transcription, to count unique cDNA molecules, as previously described. Length-based identifiers were determined for each read in the single-cell mRNA libraries processed by DR-Seq using the first coordinate of the right mate after trimming off adapter Ad-2. The length-based identifiers were used to minimize amplification biases and achieve resolution close to identifying unique cDNA molecules.
Genome_build: mm10 and hg19
Supplementary_files_format_and_content: details included in the 'processed_data_files_description.txt' file.
 
Submission date Nov 04, 2014
Last update date May 15, 2019
Contact name Lennart Kester
E-mail(s) l.kester@hubrecht.eu
Organization name Hubrecht Institute
Street address Uppsalalaan 8
City Utrecht
ZIP/Postal code 3584CT
Country Netherlands
 
Platform ID GPL17021
Series (1)
GSE62952 Integrated genome and transcriptome sequencing from the same cell
Relations
BioSample SAMN03160692
SRA SRX750535

Supplementary file Size Download File type/resource
GSM1537132_E14OMsc2.txt.gz 532.2 Kb (ftp)(http) TXT
SRA Run SelectorHelp
Raw data are available in SRA
Processed data provided as supplementary file

| NLM | NIH | GEO Help | Disclaimer | Accessibility |
NCBI Home NCBI Search NCBI SiteMap