|
Status |
Public on Nov 03, 2013 |
Title |
Hi-C, GM12878 Lymphoblastoid cells, replicate one |
Sample type |
SRA |
|
|
Source name |
GM12878 Lymphoblastoid cells
|
Organism |
Homo sapiens |
Characteristics |
cell line: GM12878 Lymphoblastoid cell line
|
Biomaterial provider |
Coriell; http://ccr.coriell.org/Sections/Search/Search.aspx?PgId=165&q=GM12878
|
Treatment protocol |
None
|
Growth protocol |
GM12878 cells (Coriell) were cultured in suspension in 85% RPMI media supplemental with 15% fetal bovine serum and 1X penicillin/streptomycin. see samples section
|
Extracted molecule |
genomic DNA |
Extraction protocol |
Hi-C experiments were conducted using HindIII according to previous publication (Lieberman-Aiden, E. et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326, 289-93 (2009).). Sequencing libraries were constructed according to previous publication (Lieberman-Aiden, E. et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326, 289-93 (2009).).
|
|
|
Library strategy |
OTHER |
Library source |
genomic |
Library selection |
other |
Instrument model |
Illumina HiSeq 2500 |
|
|
Description |
GM12878_lcp.vcf GM12878_depristoeal.vcf GM12878_seed.haps
|
Data processing |
fastq: Illumina's HiSeq Control Software For Hi-C read alignment, we aligned Hi-C reads to the mm9 (mouse) or the hg18 (human) genome. In each case, we masked any bases in the genome that were genotyped as SNPs in either Mus musculus castaneus or S129/SvJae (for mouse) or GM12878 (for humans). These bases were masked to āNā in order to reduce reference bias mapping artifacts. Hi-C reads were aligned iteratively as single end reads using Novoalign. Specifically, for iterative alignment, we first aligned the entire sequencing read to either the mouse or human genome. Unmapped reads are then trimmed by 5 base pairs and realigned. This process is repeated until the read successfully aligns to the genome or until the trimmed read is less than 25 base pairs long. After iterative mapping was finished, read pairs were re-constructed from single reads using an in house pipeline. Unmapped reads were filtered out and PCR duplicate reads were removed. Final alignment files were then processed using the GATK pipeline, specifically using Indel Realignment and Variant Recalibration Haplotypes were generated from the final aligned bam file after merging the two biological replicats using the HapCUT algorithm. The details of HapCUT are described previously (Bansal and Bafna, Bioinformatics 24, i153-159, 2008). Genome_build: mm9 Genome_build: hg18 Supplementary_files_format_and_content: The castx129_variants.vcf and GM12878_depristoeal.vcf are VCF format files of the variants used for input into the haplotyping algorithm. Both of these files are derived from publicly available datasets. WIth regards to the "publicly available datasets", the castx129_variants.vcf file is derived from data downloaded from the ENA (ERP000042) and the SRA (SRX037820). The GM12878_depristoeal.vcf is downloaded from the 1000 genomes project. Supplementary_files_format_and_content: The F123.haps and GM12878_seed.haps are modified bed format files. In this files, the first column is the chromosome, and the second column is the location of the variant. The third and fourth column are the phased variants in the "A" and "B" haplotypes. The choice of "A" and "B" is arbitrary, and it should be noted that the "A" haplotype from one chromosome is not necessarily derived from the same parent as the "A" haplotype from a different chromosome. Supplementary_files_format_and_content: The GM12878_lcp.vcf file is a VCF format file from after local conditional phasing of variants in the seed haplotype
|
|
|
Submission date |
Jul 08, 2013 |
Last update date |
Feb 22, 2021 |
Contact name |
Jesse R Dixon |
E-mail(s) |
jedixon@salk.edu
|
Organization name |
Salk Institute for Biological Studies
|
Lab |
PBL-D
|
Street address |
10010 N. Torrey Pines Rd.
|
City |
La Jolla |
State/province |
CA |
ZIP/Postal code |
92037 |
Country |
USA |
|
|
Platform ID |
GPL16791 |
Series (1) |
GSE48592 |
Whole-genome Haplotype Reconstruction using Proximity-ligation and Shotgun Sequencing |
|
Relations |
Reanalyzed by |
GSE85977 |
Reanalyzed by |
GSE87112 |
Reanalyzed by |
GSE115407 |
Reanalyzed by |
GSE128678 |
Reanalyzed by |
GSE167200 |
BioSample |
SAMN02228119 |
SRA |
SRX318776 |