NCBI Logo
GEO Logo
   NCBI > GEO > Accession DisplayHelp Not logged in | LoginHelp
GEO help: Mouse over screen elements for information.
          Go
Sample GSM1537300 Query DataSets for GSM1537300
Status Public on Jan 19, 2015
Title WCE-SKBR3_ChIP-seq
Sample type SRA
 
Source name Human Breast Cancer Cells
Organism Homo sapiens
Characteristics gender: Female
cell line: SKBR3
cell type: HER2+
antibody: None
Growth protocol Human Mammary Epithelial Cells (HMEC, Invitrogen) were grown in serum-free medium (HuMEC, Invitrogen). WI-38, ZR-75-1, SK-BR-3 and MDA-MB-436 cells were grown in Dulbecco’s modified Eagle’s medium (DMEM; Invitrogen) containing 10% fetal bovine serum (FBS).
Extracted molecule genomic DNA
Extraction protocol Phenol/chlorophrom extraction was used for gDNA isolation.
Standard Illumina protocols are followed.
ChIP-seq: Bioo Scientific NEXTflex kit is used (Catalog: 514120, 48 reactions)
ChIP-seq: Libraries were prepared according to the standard Illumina protocol and sequenced with the HiSeq2500 system. Single end sequencing was carried out to obtain around 25-150 million (M), 100bp long reads per sample.
 
Library strategy ChIP-Seq
Library source genomic
Library selection ChIP
Instrument model Illumina HiSeq 2500
 
Data processing Supplementary_files_format_and_content: Each sample contains three bedgraph files.
Supplementary_files_format_and_content: 1. ChrX full bedgraph display
Supplementary_files_format_and_content: 2. ChrX allele1 specific display of bedgraph only on SNP positions
Supplementary_files_format_and_content: 3. ChrX allele2 specific display of bedgraph only on SNP positions
Supplementary_files_format_and_content: Final SNP list: List of final SNPs derived from ChIP-seq, mRNA-seq and Exome-seq used for allele specific analysis
Datasets were subjected to two types of quality control. FASTQC-0.10.1 (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) was used to assess the quality of sequencing and potential adapter or cross contaminants. Average sequencing quality (phred score) per base was above 30 (Q ≥ 30) for all datasets. In addition, aligned datasets were then subjected toNGS-QC ((Mendoza-Parra et al., 2013); www.ngs-qc.org) to assess the robustness of enrichment. The majority of data sets were of «triple A» quality, no data set was below «triple B». For exome-seq and ChIP-seq, alignment was performed using BWA-MEM-0.7.7 (Li and Durbin, 2009) with default parameters, which simultaneously checks for both global and local alignment for reads. Alignment was followed by three sets of filters to prevent bias in the analysis. 1) duplicate reads (PCR clonal reads) were filtered out using Picard tools-1.86 (http://picard.sourceforge.net) 2) reads with mapping quality less than 10 were filtered out using Bamtools-2.2.3 (https://github.com/pezmaster31/bamtools) and 3) reads with more than one alignment reported were filtered out using in-house scripts. Further analysis was carried out on processed alignment file which is around 10-100M reads after these filters.
To prepare the allele information for each cell-line for allele-specific analysis down the line, SNP analysis was carried out along with SNP6 data. To identify novel variation (apart from known SNPs from SNP6 data), SNP analysis was carried out on all three ChIP-seq, Exome-seq and RNA-seq data individually. ChIP-seq data of different marks (H3K4me3, H3K27me3, RNA Pol II and Input) were merged for each cell line to increase the depth and confidence for variation calling. Variation calling was performed for each cell line separately for ChIP-seq and Exome-seq following the best practice GATK-2.6.5 pipeline by filtering reads with Mapping quality ≥ 1 (Van der Auwera GA et al., 2013; http://www.broadinstitute.org/gatk/guide/best-practices?bpm=DNAseq). Variation calling for RNA-seq was carried out following the methods of Piskol, Robert et al., 2013 to avoid artifacts specific to RNA- seq data (Piskol et al., 2013). A final list of allele information was generated by combining the SNP information from the different data sets for each cell line. To increase the allele-specific sensitivity for the alignment, reads were additionally realigned in an allele-specific manner following the method of Satya et al. (Satya et al., 2012). Read counts for each allele and SNP position were extracted for each mark using in-house scripts. SNP positions with at least three reads from both alleles were considered as heterozygous positions.
Peak calling was performed using HOMER ((Heinz et al., 2010); http://homer.salk.edu/homer/index.html) with default parameters. For H3K27me3 and RNA Pol II, the'style' parameter was chosen as 'histone' due to the broad patterns for this mark, whereas for H3K4me3, which generally give sharp peaks, the parameter 'factor' was chosen. Genomic context annotation over identified peaks were carried out using the HOMER annotation module but with basic annotation by excluding references other than coding genes and non-coding RNA.
A gene-based analysis of annotation integration was carried out using in-house scripts to integrate all annotation from peak and variation calling (informative SNP counts, read depths, homo/heterozygous SNP count and weighted allelic imbalance). To include annotations from regulatory regions 1Kb sequences upstream from the TSS and downstream of the TES were considered. A weighted arithmetic average was calculated for each gene by calculating average Allelic Imbalance (AI) where each SNP's AI was weighted by its read depth.
To illustrate the comparisons across cell lines, ChIP-seq data were normalized using an in-house developed tool called 'Epimetheus', which is based on quantile normalization (manuscript under preparation). Read Count Intensity (RCI) was calculated for a window of 100bp bin size across chromosomes and then these intensities were normalized using quantile normalization from the limma package. The impact of normalization was assessed using MA plots before and after normalization. Specified genomic feature based normalized RCI was constructed, which are illustrated in Figure 6. For TSS-centered plots and heatmaps, a separate TSS-based normalization was carried out with 20bp bin size to obtain higher resolution.
Genome_build: hg19
 
Submission date Nov 04, 2014
Last update date May 15, 2019
Contact name Ronan Chaligné
E-mail(s) Ronan.Chaligne@curie.fr
Organization name Institut Curie
Lab Edith Heard's lab
Street address 26, Rue d'Ulm
City Paris
ZIP/Postal code 75248
Country France
 
Platform ID GPL16791
Series (2)
GSE62907 The inactive X chromosome is epigenetically unstable and transcriptionally labile in breast cancer
GSE62966 Analysis of allele specific expression and its chromatin state to identify genes that are escaping X chromosome inactivation
Relations
BioSample SAMN03160882
SRA SRX750675

Supplementary file Size Download File type/resource
GSM1537300_SKBR3-Input.chrX.full.bedgraph.gz 4.1 Mb (ftp)(http) BEDGRAPH
SRA Run SelectorHelp
Raw data are available in SRA
Processed data provided as supplementary file
Processed data are available on Series record

| NLM | NIH | GEO Help | Disclaimer | Accessibility |
NCBI Home NCBI Search NCBI SiteMap