NCBI Logo
GEO Logo
   NCBI > GEO > Accession DisplayHelp Not logged in | LoginHelp
GEO help: Mouse over screen elements for information.
          Go
Series GSE50445 Query DataSets for GSE50445
Status Public on May 21, 2014
Title IVT-seq reveals extreme bias in RNA-sequencing
Organisms Homo sapiens; Mus musculus; mixed libraries
Experiment type Expression profiling by high throughput sequencing
Summary Background:
RNA-seq is a powerful technique for identifying and quantifying transcription and splicing events, both known and novel. However, given its recent development and the proliferation of library construction methods, understanding the bias it introduces is incomplete but critical to realizing its value.

Results:
We present a method, in vitro transcription sequencing (IVT-seq), for identifying and assessing the technical biases in RNA-seq library generation and sequencing at scale. We created a pool of over 1,000 in vitro transcribed RNAs from a full-length human cDNA library and sequenced them with polyA and total RNA-seq, the most common protocols. Because each cDNA is full length, and we show in vitro transcription is incredibly processive, each base in each transcript should be equivalently represented. However, with common RNA-seq applications and platforms, we find 50% of transcripts have more than two-fold and 10% have more than 10-fold differences in within-transcript sequence coverage. We also find greater than 6% of transcripts have regions of dramatically unpredictable sequencing coverage between samples, confounding accurate determination of their expression. We use a combination of experimental and computational approaches to show rRNA depletion is responsible for the most significant variability in coverage, and several sequence determinants also strongly influence representation.

Conclusions:
These results show the utility of IVT-seq for promoting better understanding of bias introduced by RNA-seq. We find rRNA depletion is responsible for substantial, unappreciated biases in coverage introduced during library preparation. These biases suggest exon-level expression analysis may be inadvisable, and we recommend caution when interpreting RNA-seq results.
 
Overall design 5 rRNA-depleted samples with duplicates, 1 polyA selected, 1 total RNA, and 1 plasmid library all without replicates.
 
Contributor(s) Lahens NF, Kavakli IH, Zhang R, Hayer K, Black MB, Dueck H, Pizarro A, Kim J, Irizarry R, Thomas RS, Grant GR, Hogenesch JB
Citation(s) 24981968
Submission date Aug 29, 2013
Last update date May 15, 2019
Contact name Nicholas Lahens
Organization name University of Pennsylvania
Department ITMAT
Street address Smilow Center for Translational Research 10-110 3400 Civic Center Blvd, Bldg 421
City Philadelphia
State/province PA
ZIP/Postal code 19104
Country USA
 
Platforms (5)
GPL11154 Illumina HiSeq 2000 (Homo sapiens)
GPL13112 Illumina HiSeq 2000 (Mus musculus)
GPL15520 Illumina MiSeq (Homo sapiens)
Samples (13)
GSM1219398 IVT only replicate 1
GSM1219399 IVT only replicate 2
GSM1219400 1 IVT : 1 mouse replicate 1
Relations
BioProject PRJNA217498
SRA SRP029334

Download family Format
SOFT formatted family file(s) SOFTHelp
MINiML formatted family file(s) MINiMLHelp
Series Matrix File(s) TXTHelp

Supplementary file Size Download File type/resource
GSE50445_RAW.tar 182.8 Mb (http)(custom) TAR (of BW, TXT)
SRA Run SelectorHelp
Raw data are available in SRA
Processed data provided as supplementary file

| NLM | NIH | GEO Help | Disclaimer | Accessibility |
NCBI Home NCBI Search NCBI SiteMap