GEO Accession viewer

NCBI > GEO > Accession Display

Not logged in | Login

GEO help: Mouse over screen elements for information.

Sample GSM1267200

Query DataSets for GSM1267200

Status

Public on Feb 18, 2015

Title

Hi-C, H1 Mesenchymal stem cells, replicate one

Sample type

SRA

Source name

H1 Mesenchymal stem cells

Organism

Homo sapiens

Characteristics

cell type: H1 Mesenchymal stem cells

Treatment protocol

None

Growth protocol

Growth and differentiation of H1 hESCs was performed as previously described in Xie et al., 2013, Cell 153 (1134-1148)

Extracted molecule

genomic DNA

Extraction protocol

Sequencing libraries were constructed according to previous publication (Lieberman-Aiden, E. et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326, 289-93 (2009).).

Library strategy

OTHER

Library source

genomic

Library selection

other

Instrument model

Illumina HiSeq 2500

Description

MS_R1_T1_read1.fastq.gz
MS_R1_T1_read2.fastq.gz
MS_R1_T2_read1.fastq.gz
MS_R1_T2_read2.fastq.gz
MS_R1_T3_read1.fastq.gz
MS_R1_T3_read2.fastq.gz

Data processing

Library strategy: Hi-C
fastq: Illumina's HiSeq Control Software
For Hi-C read alignment, we aligned Hi-C reads to the hg18 (human) genome. We masked any bases in the genome that were genotyped as SNPs in the H1 genome. These bases were masked to “N” in order to reduce reference bias mapping artifacts. Hi-C reads were aligned iteratively as single end reads using Novoalign. Specifically, for iterative alignment, we first aligned the entire sequencing read to either the mouse or human genome. Unmapped reads are then trimmed by 5 base pairs and realigned. This process is repeated until the read successfully aligns to the genome or until the trimmed read is less than 25 base pairs long. After iterative mapping was finished, read pairs were re-constructed from single reads using an in house pipeline. Unmapped reads were filtered out and PCR duplicate reads were removed. Final alignment files were then processed using the GATK pipeline, specifically using Indel Realignment and Variant Recalibration. A similar pipeline was used for alignment of the other high-throughput sequencing datasets without the iterative alignment step.
Haplotypes were generated from the final aligned bam file after merging the two biological replicats using the HapCUT algorithm. The details of HapCUT are described previously (Bansal and Bafna, Bioinformatics 24, i153-159, 2008).
Genome_build: hg18
Supplementary_files_format_and_content: Reads are listed in bed format with one line for each sequencing read. The reads have been split by haplotype into the "A" and "B" (alternatively, "p1" and "p2") alleles according to which haplotype the bases within each sequencing read correspond. For paired end Hi-C data, each line lists a single read, and paired infomration can be obtained from the read names. The original fastq files for data other than the Hi-C and CTCF ChIP-seq are available in the GSE16256 dataset.
Supplementary_files_format_and_content: The processed haplotypes for the H1 genome ("H1_haps.vcf") are available in VCF format.

Submission date

Nov 18, 2013

Last update date

May 15, 2019

Contact name

Jesse R Dixon

E-mail(s)

jedixon@salk.edu

Organization name

Salk Institute for Biological Studies

Lab

PBL-D