NCBI Logo
GEO Logo
   NCBI > GEO > Accession DisplayHelp Not logged in | LoginHelp
GEO help: Mouse over screen elements for information.
          Go
Series GSE97212 Query DataSets for GSE97212
Status Public on Apr 01, 2017
Title The Project for High-Confidence Coding and Noncoding Transcriptome Maps
Platform organism Mus musculus
Sample organisms Homo sapiens; Mus musculus
Experiment type Expression profiling by high throughput sequencing
Third-party reanalysis
Summary The advent of high-throughput RNA sequencing (RNA-seq) has led to the discovery of unprecedentedly immense transcriptomes encoded by eukaryotic genomes. However, the transcriptome maps are still incomplete partly because they were mostly reconstructed based on RNA-seq reads that lack their orientations (known as unstranded reads) and certain boundary information. Methods to expand the usability of unstranded RNA-seq data by predetermining the orientation of the reads and precisely determining the boundaries of assembled transcripts could significantly benefit the quality of the resulting transcriptome maps. Here, we present a high-performing transcriptome assembly pipeline, called CAFE, that significantly improves the original assemblies, respectively assembled with stranded and/or unstranded RNA-seq data, by orienting unstranded reads using the maximum likelihood estimation and by integrating information about transcription start sites and cleavage and polyadenylation sites. Applying large-scale transcriptomic data comprising 230 billion RNA-seq reads from the ENCODE, Human BodyMap Projects, The Cancer Genome Atlas, and GTEx, CAFE enabled us to predict the directions of about 220 billion unstranded reads, which led to the construction of more accurate transcriptome maps, comparable to the manually curated map, and a comprehensive lncRNA catalogue that includes thousands of novel lncRNAs. Our pipeline should not only help to build comprehensive, precise transcriptome maps from complex genomes but also to expand the universe of non-coding genomes.

This SuperSeries is composed of the SubSeries listed below.
 
Overall design The direction of unstranded reads (from ENCODE, Human BodyMap Projects, GTEx and TCGA as well as from HeLa and mES cells) were predicted using k-order Markov chain models (kMC) generating a read with a predicted direction (RPD) and were used to assemble transcriptome maps (BIGTranscriptome). Those transcriptome maps were next used for quantification of RPDs.

Refer to individual Series
Web link http://big.hanyang.ac.kr/CAFE
 
Citation(s) 28396519
Submission date Mar 29, 2017
Last update date May 15, 2019
Contact name Jin-Wu Nam
E-mail(s) jwnam@hanyang.ac.kr
Phone +82 2-2220-2428
Organization name Hanyang University
Department Department of Life Science
Street address Seongdong-Gu Hangdang-dong
City Seoul
ZIP/Postal code 133-791
Country South Korea
 
Platforms (1)
GPL13112 Illumina HiSeq 2000 (Mus musculus)
Samples (2)
GSM2254467 mESC_stranded_RNASeq
GSM2254468 mESC_unstranded_RNASeq
This SuperSeries is composed of the following SubSeries:
GSE84946 Co-assembly of stranded and unstranded RNA-seq data improves coding and noncoding transcriptome maps
GSE97211 High-confidence Coding and Noncoding Transcriptome Maps
Relations
BioProject PRJNA381216

Download family Format
SOFT formatted family file(s) SOFTHelp
MINiML formatted family file(s) MINiMLHelp
Series Matrix File(s) TXTHelp

Supplementary file Size Download File type/resource
GSE97212_RAW.tar 8.6 Mb (http)(custom) TAR (of GTF)
SRA Run SelectorHelp

| NLM | NIH | GEO Help | Disclaimer | Accessibility |
NCBI Home NCBI Search NCBI SiteMap