GEO Accession viewer

NCBI > GEO > Accession Display

Not logged in | Login

GEO help: Mouse over screen elements for information.

Series GSE97212

Query DataSets for GSE97212

Status

Public on Apr 01, 2017

Title

The Project for High-Confidence Coding and Noncoding Transcriptome Maps

Platform organism

Mus musculus

Sample organisms

Homo sapiens; Mus musculus

Experiment type

Expression profiling by high throughput sequencing
Third-party reanalysis

Summary

The advent of high-throughput RNA sequencing (RNA-seq) has led to the discovery of unprecedentedly immense transcriptomes encoded by eukaryotic genomes. However, the transcriptome maps are still incomplete partly because they were mostly reconstructed based on RNA-seq reads that lack their orientations (known as unstranded reads) and certain boundary information. Methods to expand the usability of unstranded RNA-seq data by predetermining the orientation of the reads and precisely determining the boundaries of assembled transcripts could significantly benefit the quality of the resulting transcriptome maps. Here, we present a high-performing transcriptome assembly pipeline, called CAFE, that significantly improves the original assemblies, respectively assembled with stranded and/or unstranded RNA-seq data, by orienting unstranded reads using the maximum likelihood estimation and by integrating information about transcription start sites and cleavage and polyadenylation sites. Applying large-scale transcriptomic data comprising 230 billion RNA-seq reads from the ENCODE, Human BodyMap Projects, The Cancer Genome Atlas, and GTEx, CAFE enabled us to predict the directions of about 220 billion unstranded reads, which led to the construction of more accurate transcriptome maps, comparable to the manually curated map, and a comprehensive lncRNA catalogue that includes thousands of novel lncRNAs. Our pipeline should not only help to build comprehensive, precise transcriptome maps from complex genomes but also to expand the universe of non-coding genomes.

This SuperSeries is composed of the SubSeries listed below.

Overall design

The direction of unstranded reads (from ENCODE, Human BodyMap Projects, GTEx and TCGA as well as from HeLa and mES cells) were predicted using k-order Markov chain models (kMC) generating a read with a predicted direction (RPD) and were used to assemble transcriptome maps (BIGTranscriptome). Those transcriptome maps were next used for quantification of RPDs.

Refer to individual Series

Web link

http://big.hanyang.ac.kr/CAFE

Citation(s)

28396519

Submission date

Mar 29, 2017

Last update date

May 15, 2019

Contact name

Jin-Wu Nam

E-mail(s)

jwnam@hanyang.ac.kr

Phone

+82 2-2220-2428

Organization name

Hanyang University

Department

Department of Life Science

Street address

Seongdong-Gu Hangdang-dong

City

Seoul

ZIP/Postal code

133-791

Country

South Korea

Platforms (1)

GPL13112

Illumina HiSeq 2000 (Mus musculus)

Samples (2)

GSM2254467	mESC_stranded_RNASeq
GSM2254468	mESC_unstranded_RNASeq

This SuperSeries is composed of the following SubSeries:

GSE84946	Co-assembly of stranded and unstranded RNA-seq data improves coding and noncoding transcriptome maps
GSE97211	High-confidence Coding and Noncoding Transcriptome Maps

Relations

BioProject

PRJNA381216

Download family	Format
SOFT formatted family file(s)	SOFT
MINiML formatted family file(s)	MINiML
Series Matrix File(s)	TXT

Supplementary file	Size	Download	File type/resource
GSE97212_RAW.tar	8.6 Mb	(http)(custom)	TAR (of GTF)
SRA Run Selector