NCBI Logo
GEO Logo
   NCBI > GEO > Accession DisplayHelp Not logged in | LoginHelp
GEO help: Mouse over screen elements for information.
          Go
Sample GSM1212207 Query DataSets for GSM1212207
Status Public on Feb 04, 2014
Title Grimmond_hs_thymus
Sample type SRA
 
Source name Ambion
Organism Homo sapiens
Characteristics cell type: Ambion
tissue: human thymus
Extracted molecule total RNA
Extraction protocol Illumina library construction and sequencing. RNA-seq libraries were constructed using the strand-specific dUTP method, with minor modifications. Briefly, 3ug of DNAse treated RNA was depleted of rRNA using Ribozero (Epicentre). Two batches of rRNA-depleted samples were combined, cleaned by RiboMinus concentration module (Invitrogen) and fragmented at 90°C for 3 min (NEB fragmentation buffer). First strand synthesis was followed by cleanup with RNAClean XP SPRI beads (Agencourt). Second strand synthesis incorporated dUTP, followed by sample clean up with MinElute PCR purification Kit (Qiagen). Fragment ends were repaired, adenylated, then ligated to True-Seq barcoded adaptors and cleaned up with AMPure XP SPRI beads (Agencourt). The libraries were then amplified by PCR for 12 cycles and cleaned up with AMPure XP SPRI beads. Illumina sequencing (1 x 50-bp read length) was performed on a HiSeq 2000. 4SU libraries were prepared non-strand-specifically using standard Illumina RNA-Seq.
SOLiD RNA-Seq library construction and sequencing. Total RNA was depleted of ribosomal RNA by hybridization using RiboMinus (Invitrogen) and was heat-fragmented. rRNA-depleted, fragmented RNA was processed into SOLiD sequencing libraries using the Small RNA Expression Kit. Sequencing was performed on SOLiD with 35-bp read lengths (human tissues, mouse neurons).
 
Library strategy RNA-Seq
Library source transcriptomic
Library selection cDNA
Instrument model AB SOLiD System 3.0
 
Description Total RNA was depleted of rRNA by hybridization (using RiboMinus) and heat fragmented. Strand-specific RNA-Seq libraries were prepared using the Small RNA Expression (cat#4399434b) method on SOLiD (for human tissues and mouse neurons).
Data processing Base calling was performed using standard approaches.
Illumina raw reads in .fastq files were aligned using bwa 0.6.2. Target sequences included (1) 24 hg19 autosomal and sex chromosomes plus (2) a read-length-dependent splice library (~3M sequences, each 50-98bp). The latter comprised all possible minimal intragenic sequences of two or more exons (based on the RefSeq annotation) for which one or more exon-exon junctions could be crossed by reads of the required length. After bwa-indexing the targets with the bwtsw algorithm, reads were aligned with zero trimming, zero gaps, and allowing up to 5 mismatches. Reads for each end of the two samples with paired-end reads were aligned as single reads only to test their consistency; thereafter only the "1" ends were used. The sam/bam files output by bwa were indexed and sorted and served as input for further processing.
SOLiD raw reads were aligned using Corona with 0-3 cs mismatches for 35-bp reads. Uniquely mapped reads output to SOLiD-format "unique.csfasta.ma" files served as input for further processing. SOLiD quality scores are in the .qual files.
Aligned reads were minimally filtered for QC, removing any reads with uncalled bases or with unmapped or nonuniquely mapped sequences.
Employing "MAPtoFeatures" perl scripts, the loci of uniquely mapped reads were compared to the loci of all genic features (exons, introns, UTRs, CDSs, etc., and all junctions between them) of (1) all genes based on the RefSeq annotation and (2) all rRNA genes based on RepeatMasker, for the alignment target genome. Every overlap of an individual read with genic features, including exon-exon splice junctions, was counted towards the features' "nrds" while the number of bases overlapped was added to their "rdbp". For every genic feature the average Density (read coverage per bp) was calculated, equal to rdbp per feature length. Each sample's Densities were renormalized from total reads not in noncoding genes and not in rRNA to a standard total of 10M reads, and to a standard read length of 35bp. With these conventions, units of Density always equal RPKM times 0.35. Normalized Densities were treated as expression levels comparable among all samples.
Genome_build: For the 3 mouse samples (Total1): mm9 (NCBI ref_C57BL-6J assembly 37, build 1, July 2007). For the other 26 samples (IsoG-CPT, LCL, Grimmond), all human: hg19 (NCBI GRCh37 assembly 37, build 1, Feb. 2009).
Supplementary_files_format_and_content: The "Genes-FEATURES-READS" .tsv files are spreadsheets output by MAPtoFeatures. Each row has data for one gene (with NCBI Entrez Gene IDs) or, in some case, one rRNA locus (with artifical IDs). A 1-line header labels 14-16 columns containing genomic information, followed by blocks of columns labeled "num" (number of each feature type per gene), "bp" (total length of each feature type per gene), and in some cases "rdbp_avail" (total length of all reads that could uniformly cover all features of each type per gene), followed by data columns. Data in the remaining blocks are distinguished by reads that mapped S[ense] or A[ntisense] to each gene: "nrds" (number of overlapping reads wholly contained within each feature type per gene), "rdbp" (total number of bases of reads that overlap each feature type at all, including reads crossing feature junctions), "Density" (expression levels, equal to normalized coverage as described above). Some columns labeled "-ALL" give concatenated values for all individual features, listed 5' to 3', within each gene. Exon-exon junctions, labeled "SPL", are notated as <count>(<exons>) with "^" between adjoining exons; e.g., "2(5^6)" indicates 2 reads crossing the junctions between exons 5 and 6, and "1(3^5^6)" for 1 read crossing from exon 3 to 5, through all of 5, and crossing to exon 6. Exon-intron junctions, labeled "JXN", are notated as, e.g., "1(8>8<)" for 1 read crossing from exon 8 into intron 8 or "3(8<9>)" for 3 reads crossing intron 8 and exon 9.
Supplementary_files_format_and_content: The "cross_reads_and_features__LOG" .txt files document each MAPtoFeatures analysis, including total reads used for normalization, totals per feature across the genome, and other statistics.
Supplementary_files_format_and_content: The "BINS-READS-Densities" .tsv files contain tiled Densities per every individual intron in the annotated genome of interest. Each intron is divided into 100 bins of equal size (fractional bp); the contributions of reads that only partially overlap a bin are pro rated to it (a single read's average Density in a bin equals the fraction of the bin it covers). Average Densities per bin are renormalized as in MAPtoFeatures, but only Sense reads are included. About 20 columns of genic and locus information for the ~200k introns are followed by 100 columns listing each intron's 100 normalized Densities ordered from its 5'-most to its 3'-most bin. These Densities are used to quantify the slopes of the differential abundance of nascent transcripts across introns.
Supplementary_files_format_and_content: The bigWig .bw files are in standard format for compressed variable-step wiggle tracks. Most samples listed here have two bigWig files, intended for separate tracks for POS and NEG strands. (There are only POS tracks for the 6 strand-nonspecific LCL samples, except additionally both strands are shown for each end for the 2 samples with paired ends. See "Exceptions" below.)
Supplementary_files_format_and_content: A single .txt file contains example track lines for all the .bw files in this set.
Supplementary_files_format_and_content: Exceptions: LCL__BM1_4SU__1 & LCL__BM1_4SU__2 refer to companion sets of paired end reads for a single sample. Similarly for LCL__BM2_4SU__1 & LCL__BM2_4SU__2. There are bigWig files for both ends for both these samples. Although these are strand-nonspecific samples, reads mapped to each strand are shown. There are no MAPtoFeatures spreadsheets for these individual data sets. The other 4 LCL samples (BM3_4SU and 3 riboMinus) have neither strand-specific nor paired-end reads, and so were analyzed with all reads assigned to the POS strand and flagged as mapping S[ense] to all gene features. The "1" end of BM1_4SU and BM2_4SU were also analyzed disregarding strand and sense, yielding 6 sets of similarly generated bigWig tracks and MAPtoFeatures spreadsheets for the 6 LCL samples. The slopes of these data sets were not analyzed so binned Density tables for them are not included here. Raw sequence data for the Total1 samples have been published elsewhere (Kim et al., Nature 2010) so .csfasta files for them are not included here.
 
Submission date Aug 19, 2013
Last update date May 15, 2019
Contact name David A Harmin
E-mail(s) David_Harmin@hms.harvard.edu
Organization name Harvard Medical School
Department Neurobiology
Lab M.E. Greenberg
Street address 220 Longwood Avenue
City Boston
State/province MA
ZIP/Postal code 02115
Country USA
 
Platform ID GPL9442
Series (1)
GSE48889 SnapShot-Seq: a method for extracting genome-wide, in vivo mRNA dynamics from a single total RNA sample
Relations
BioSample SAMN02319028
SRA SRX336870

Supplementary file Size Download File type/resource
GSM1212207_Grimmond_hs_thymus_BINS-READS-Densities_r35_N10M_ALL-GENES_INT.EVERY_u100.tsv.gz 13.0 Mb (ftp)(http) TSV
GSM1212207_Grimmond_hs_thymus_Genes-FEATURES-READS-r35_x500_N10M_ALL-GENES_EXN-ALL_INT-ALL_SPL_JXN_END_mm3.tsv.gz 8.3 Mb (ftp)(http) TSV
GSM1212207_Grimmond_hs_thymus_hg19-SPL+wigs_hg19_GNM+SPL_r35_mm3_span-20_NEG.bw 24.8 Mb (ftp)(http) BW
GSM1212207_Grimmond_hs_thymus_hg19-SPL+wigs_hg19_GNM+SPL_r35_mm3_span-20_POS.bw 25.8 Mb (ftp)(http) BW
GSM1212207_cross_reads_and_features_LOG_100416_134711_O_Grimmond_hs_thymus.txt.gz 21.5 Kb (ftp)(http) TXT
SRA Run SelectorHelp
Raw data are available in SRA
Processed data provided as supplementary file
Processed data are available on Series record

| NLM | NIH | GEO Help | Disclaimer | Accessibility |
NCBI Home NCBI Search NCBI SiteMap