Understanding the difference between RNA, ATAC, and ChIP-sequencing
Understanding the difference between RNA, ATAC, and ChIP-sequencing

Methods like RNA, ATAC, and ChIP sequencing are NGS techniques that have made it easier for researchers to investigate various aspects of epigenetic and RNA modifications on a targeted and genome-wide scale. All these methods have increased the pace of cell transcriptomics, epigenetic modifications, and transcription factor binding studies. They have been growing in popularity in recent years. Each method has its own distinct uses and analysis steps, which are covered below.

RNA-Sequencing

Between ChIP, ATAC, and RNA-seq, the latter is the most frequently used NGS technique, where the total RNA complement from a given sample is obtained and sequenced using a next generation sequencing (NGS) platform. The main goal of RNA-seq is to analyze the cellular transcriptome, which includes RNA gene transcripts and their isoforms.

RNA-seq transcripts are initially reverse transcribed into cDNA, and then adapters are attached to each end of the cDNA. The cDNAs are later sequenced into single-end reads or paired-end reads.

How To Read RNA-Seq Data

One of the first steps in RNA-seq analysis involves quality control followed by a read trimming step. Reads passing the stringent filtering criteria are then aligned to a reference genome, or in some cases, assembled to obtain a genome using de novo assembly. Many bioinformatics tools can be used to quantify and study gene expression levels. Differential expression analysis aims to identify genes that are significantly changed (including both up-regulated and down-regulated) between two groups of samples, such as treatment and control samples.

Additionally, pathway analysis can summarize at a higher level the expression changes between conditions. RNA-seq data analysis can also identify the co-expression analysis network of similar genes behaving in a similar way under certain circumstances.

ChIP-Seq

ChIP-seq is a combination of chromatin immunoprecipitation technique with high throughput sequencing. It is a significant next generation sequencing method of identifying genome-wide DNA binding sites for transcription factors and other proteins. ChIP-sequencing enables the mapping of transcription factors, DNA binding proteins, and histone modifications on a genome-wide scale.

The sequencing chemistry involves the fixation of chromatin with the help of formaldehyde through covalent bonding between the DNA-binding proteins and DNA. The process includes cell lysis and fragmentation of DNA into smaller fragments by isolating the DNA-protein complexes through immunoprecipitation with protein-specific antibodies.

How to Read ChIP-Seq Data

It is important to make sense of each step of the analysis process in order to understand how to read ChIP-seq data. Initially, the quality of sequenced reads is assessed to identify possible sequencing artifacts. The quality-controlled sequencing reads are then mapped on a reference genome. After alignment, various quality metrics are examined in the ChIP-seq data to detect such things as as poor fragment size selection and insufficient sequencing depth. Moreover, spike-in analysis can be performed to normalize the total chromatin in the sample. Further analysis involves the identification of gene ontology terms and motifs. ChIP-seq data analysis can also be integrated with chromatin conformation, genetic variation, and DNA methylation analysis (Ryuichiro and Katsuhiko, 2016).

Peak calling is also a frequent step in reading ChIP-seq data; it predicts the regions of a genome where ChIP-ed protein is associated by finding regions with a significant amount of mapped reads (Bailey et al., 2013).

ChIP-sequencing (chromatin immunoprecipitation) is helpful for the investigation of epigenetic modifications on a genome-wide scale. However, the successful analysis of these modifications requires some prior knowledge of their role and effects on the experimental system/settings. Knowing how to read ChIP-seq data correctly helps avoid false positives and false negatives. In addition, ATAC-seq, discussed below, is a useful complement that can give researchers preliminary knowledge about the changes in chromatin accessibility across the genome.

ATAC-Seq

ATAC-seq is based on a next generation sequencing (NGS) library construction using hyperactive transposases (Tn5) (Shashikant and Ettensohn, 2019). ATAC-seq was first identified by Buenrostro et al. (2013) as a sensitive method for integrative epigenome analysis. ATAC-seq captures open chromatin sites using a two-step method and identifies the link between genomic regions of open chromatin sites, nucleosomes, DNA binding proteins, and regulatory regions.

For ATAC-seq, adapters are loaded on transposases, which results in the fragmentation of chromatin and integration of the loaded adapters into open chromatin regions. The generated library is later sequenced by NGS platforms. These platforms sequence regions of the genome with open/accessible chromatin, which can be analyzed by using various NGS analysis approaches.

How to Read ATAC-Seq Data

ATAC-seq analysis starts with the preliminary step of read quality control, followed by a read mapping step. The duplicated and unaligned reads are removed, and then reads from paired-end sequencing are used for peak calling in a way that is similar to identifying transcription factor (TF) binding sites when reading ChIP-seq data. This step counts the number of reads present in a certain region and calculates its significance. In subsequent steps, the fragment density is calculated to identify the frequency of transpositional events on the genome. The fragment density calculation step is followed by a normalization step to normalize the number of aligned reads. Lastly, the peak metrics between samples are identified by grouping the overlapping peaks into active regions.

Conclusion

Next generation sequencing has enabled scientists to uncover many hidden mechanisms of biology and diseases with unprecedented pace. All the NGS techniques discussed above are used to study many aspects of the genome, including gene regulation, gene expression, histone modifications, and much more. Ultimately, scientists may use a combination of NGS techniques within their research projects.

References

Buenrostro, J. D., Giresi, P. G., Zaba, L. C., Chang, H. Y., & Greenleaf, W. J. (2013). Transposition of native chromatin for multimodal regulatory analysis and personal epigenomics. Nature methods, 10(12), 1213.

Bailey, T., Krajewski, P., Ladunga, I., Lefebvre, C., Li, Q., Liu, T., ... & Zhang, J. (2013). Practical guidelines for the comprehensive analysis of ChIP-seq data. PLoS computational biology, 9(11), e1003326.

Ryuichiro, N. and Katsuhiko, S. (2016). Recent advances in ChIP-seq analysis: from quality management to whole-genome annotation. Briefings in bioinformatics, 18(2), 279-290.

Shashikant, T., & Ettensohn, C. A. (2019). Genome-wide analysis of chromatin accessibility using ATAC-seq. Methods in cell biology, 151, 219.

Amit U Sinha, PhD (Machine Learning and Genomics) is the founder and CEO of Basepair, an online NGS analysis platform. Amit is an expert in genomics and bioinformatics, with over a decade of experience in the field. Prior to founding Basepair, Amit worked as an investigator at Memorial Sloan Kettering Cancer Center. Additionally, he has held research faculty positions at the Dana Farber Cancer Institute and Harvard Medical School. Amit's work focuses on leveraging technology to improve healthcare research by enabling scientists to make sense of big data quickly and accurately.