Introduction: What is DNA methylation?

DNA methylation is one of the most common and most studied epigenetic processes, which alters gene expression without changing the DNA sequence. Within the mammalian genome, methylation of cytosine at the fifth position (Figure 1) is mechanistically understood and is mainly limited to cytosine (C) and guanine (G) rich regions, termed CpG islands. These islands are frequent in the promoters of transcription start sites, which are DNA regions that initiate transcription of the associated gene. In particular, CpG islands are prevalent for housekeeping genes, important for regular cellular functions. Hypermethylation, an increase in methylation relative to normal, of CpG islands silences gene expression and the opposite is also true; hypomethylation promotes gene expression. Methylation status of DNA varies on both a spatial and temporal scale, with changes seen due to age, tissue type, and environmental interactions. Thus, DNA methylation analysis can help researchers gain valuable insight into gene regulation and development.

Fig1-DNA_methylation

Figure 1. DNA methylation. A methyl group addition to the cytosine carbon 5 in cytosine-phosphate-guanine (CpG) and other nucleotide sequences inhibits the binding of transcription factors to promoters.

DNA methylation in cancer

Aberrant DNA methylation is implicated in many disease processes, including cancer, obesity, and addiction. Studies have shown that DNA methylation occurs very early in cancer development and may contribute to tumorigenesis, as well as disease progression (1). DNA hypermethylation may directly silence genes that act as tumor suppressors and are involved in processes that are important in cancer development. For example, the SEPT9 gene encodes for the septin-9 protein, which is involved in cytokinesis during the cell division process and is also thought to be a tumor suppressor. Hypermethylation may also indirectly inactivate tumor-associated genes by silencing important transcription factors or by affecting DNA repair.

DNA hypomethylation and the resulting increase in gene expression may result in tumorigenesis due to the activation of growth-promotion genes such as R-Ras in gastric cancer. Hypomethylation can also occur in other types of genomic elements, such as repetitive regions and transposons. This may lead to an increase in genomic instability and chromosomal rearrangements that result in cancer (Table 1).

Cancer type

Demethylated repeat sequences

Ovarian carcinoma
L1, Alu, Sat2
Hepatocellular carcinoma
L1
Breast cancer
SatR-1
Chronic myeloid leukemia
L1
Renal cell carcinoma
L1, HERV-K

Table 1. DNA hypomethylation of repetitive sequences and associated cancer types. Adapted from Ref 1.

With appropriate assay design and validation, DNA methylation may be used as a biomarker to support clinical decisions in various cancers (2). For example, the U.S. Food and Drug Administration (FDA) has approved DNA methylation assays for the screening of colorectal cancer (3,4).

DNA methylation has several advantages as a biomarker in healthcare applications (2,5).  First, alterations of DNA methylation occur at higher percentages in cancer than genetic variations, resulting in higher sensitivity for methylation studies. DNA hypermethylation only occurs in promoter regions at CpG islands. This allows one to focus on a specific and small genetic region of interest. DNA methylation is binary, i.e., a nucleotide is either methylated or not methylated, enabling reliable measurements for a biomarker assay. DNA methylation patterns are also robust. Thus, DNA methylation analysis can be performed on a variety of samples, including ones that are heterogeneous or degraded. Fresh-frozen and formalin-fixed paraffin-embedded (FFPE) tissues that are stored long term can be analyzed for methylation biomarkers. Analysis of peripheral blood samples is also possible, enabling noninvasive testing for cancer.


Studying DNA methylation with NGS

Due to its dynamic nature, the "methylome" as it is known, is much more variable than the genome and is inherently more complex. The advent of NGS and the ability to perform a massively parallel analysis of the methylome has enabled researchers to obtain a comprehensive, unbiased, and quantitative view of the methylation landscape at single base pair resolution. NGS allows researchers to rapidly sequence millions of DNA molecules. Each DNA sequence can be counted and quantified, instead of previous methylation analysis approaches that looked at relative measurements between samples. The high sensitivity and specificity of NGS also enables researchers to identify novel epigenetic biomarkers.

To capture genome-wide information on DNA methylation, a variety of methods have evolved over time. This includes bisulfite conversion, digestion with methylation-sensitive restriction enzymes, and antibody- or 5-methylcytosine binding protein–based purification of methylated DNA (Table 2). Affinity enrichment-based methods, as the name implies, utilize proteins that naturally bind to methylated DNA to help isolate the desired genetic material. The unmethylated DNA is washed away and the enriched portion is subsequently amplified and prepared for analysis. Although this method is cost effective, enrichment-based methods are clearly biased towards regions that are hypermethylated. Affinity enrichment also does not capture which nucleotides are actually methylated.

 
Affinity enrichment
Restriction enzyme
Bisulfite conversion
Summary
Antibodies and methylated-CpG binding proteins are used to pull capture and enrich methylated genomic regions for analysis
Methylation sensitive restriction enzymes used to cleave the recognition site, leaving either intact methylated or unmethylated sequences for analysis
Bisulfite treatment converts unmodified cytosine to uracil, but methylated cytosine is protected and unchanged
Resolution
~250 bp
Single-base
Single-base
CpGs covered
~23 million CpGs
~2 million CpGs
>28 million CpGs
Pros
  • Cost-effective
  • No mutations introduced
High sensitivity with lower costs
Evaluate every CpG site
Cons
  • Bias towards hypermethylated regions
  • Inability to predict absolute methylation level

Regions without enzyme restriction site are not covered

  • Higher costs
  • Higher DNA input unless a targeted approach is taken
  • DNA degradation after treatment

Table 2. Comparison of genome-wide approaches for DNA methylation profiling. Adapted from Ref 6.

Certain restriction enzymes, such as MspI, are sensitive to DNA methylation and will cleave their recognition site if it is methylated. The opposite is also true, with certain enzymes not cleaving their recognition site if it is methylated. Thus, DNA methylation can be elucidated to single base pair resolution by using a combination of enzymes to isolate methylated or unmethylated sequences for analysis. Although simple to perform, the use of restriction enzymes means analysis is isolated to only those genomic regions that have the appropriate restriction sites.

Bisulfite treatment is widely considered the leading method for genome-wide methylation profiling. Treatment of DNA with bisulfite (HSO3-) converts cytosines to uracil, but 5-methylcytosines are left unchanged. These specific changes can be analyzed to provide single nucleotide resolution across the genome, revealing differences in methylation between complementary strands and alleles.  NGS throughput and accuracy can be leveraged to analyze and quantify all of this information about the methylome using the appropriate bioinformatics tools.

While capable of being the most informative method, bisulfite treatment can be prohibitive from a cost and sample input perspective if one is studying the entire human genome. A targeted approach by focusing only on genomic regions of interest can help alleviate these concerns and as noted, may actually be of benefit when one’s focus is on biomarker development in cancer research studies and improving clinical decision making. Other benefits of targeted methylation sequencing include increasing sequencing coverage levels and reducing data analysis complexity by only looking at specific areas of the genome. Increasing the number of samples that can be sequenced and analyzed simultaneously is also possible.

To learn more about targeted sequencing approaches and why coverage is important, explore the NGS basics learning center articles.

References

  1. Watanabe Y, Maekawa M. Advances in Clinical Chemistry 52:145 (2010)
  2. Bock C, et al. Nature Biotech 34:726 (2016)
  3. Kadiyska T, Nossikoff A. World J Gastroenterol 21:10057 (2015)
  4. Premarket Approval (PMA) https://www.accessdata.fda.gov/ scripts/cdrh/cfdocs/cfPMA/pma.cfm?id=P130001
  5. Heyn H, Esterller M. Nature Rev 13:679 (2012)
  6. Barros-Silva D, et al. Genes 9:429 (2018)

Top