Next-Generation Sequencing Illumina Workflow–4 Key Steps

차세대 염기서열분석(NGS, Next Generation Sequencing)은 단 하루 만에 인간의 게놈을 비롯한 크고 복잡한 게놈의 서열분석을 가능하게 하는 높은 처리량의 서열분석 방법입니다. Illumina NGS 시스템에서는 Nucleic acid 샘플의 대규모 병렬 서열분석을 통해 데이터의 높은 처리량 산출이 가능합니다. 워크플로우에는 원하는 Nucleic acid의 분리와 단편화 및 서열분석용 샘플 준비(라이브러리 준비), 서열분석 반응, 서열분석 데이터의 생물정보학적 처리 및 분석이 포함됩니다. 이 섹션에서는 Figure 1과 같이 각 워크플로우의 주요 단계에 대한 개요와 고려해야할 주요 사항에 대해 다룹니다.

Next-generation sequencing workflow using Illumina systems

Figure 1. Next-generation sequencing workflow using Illumina systems.

1. Nucleic acid isolation for NGS

Nucleic acid isolation is a crucial first step in the NGS workflow, regardless of whether you are sequencing genomic DNA (gDNA), total RNA, or different RNA types. It is important to select an isolation method or kit that enables proper lysis of the cells and tissue. This will, in turn, help you obtain the yield, purity, and quality needed for subsequent library preparation steps.

Considerations for nucleic acid isolation for NGS include the following to help ensure success in downstream steps.

  • Yield: Nanograms (ng) to micrograms (µg) of DNA or RNA are usually required for library preparation. Therefore, obtaining the maximum possible amount of high-quality nucleic acids, even from limited or archived sources such as cell-free DNA (cfDNA) and formalin-fixed, paraffin-embedded (FFPE) samples, is critical for NGS success.
  • Purity: Isolated DNA or RNA should be free of compounds that can inhibit enzymes during library preparation. Typically, these inhibitors could be reagents from nucleic acid isolation (e.g., phenol, ethanol), or contaminants carried over from biological samples (e.g., heparin, humic acid). Therefore, the isolation method of choice should remove or minimize carryover of these types of contaminants.
  • Quality: The integrity and quality of isolated nucleic acids is another important aspect for NGS success. For example, when working with gDNA, the majority of isolated DNA should be of high molecular weight and intact. When working with RNA, degradation should be minimized during the sample preparation process. Also, isolated RNA should be heterogeneous and representative of the nucleic acid populations in the original sample source. When using FFPE samples, in which DNA and RNA are already fragmented, select appropriate isolation methods or kits to obtain sufficient yield and quality of nucleic acids for sequencing.

Yield, purity, and quality of isolated nucleic acids should be assessed before proceeding to NGS library preparation. The following are methods commonly used for examination of these attributes:

  • UV spectrophotometric assays measure A260, A260:A280 ratio, and A260:A230 ratio to help assess sample purity and yield
  • Fluorometric assays help quantify specific types of nucleic acids (e.g., ssDNA, dsDNA, small RNA)
  • Gel-based or microfluidic electrophoresis helps determine fragment size, distribution, and quantity

In assessing RNA integrity, the RNA integrity number (RIN), obtained using an algorithm and microfluidic-based electrophoresis [1], and the integrity and quality (IQ) score, based on fluorometry [2], can provide quantitative values of the intact RNA population in the isolated sample.

What is purity ratio?

RNA quantitation and qualification by fluorometry

In cases of low quantities of nucleic acids (e.g., when using single cells as the source) isolated DNA and RNA may be amplified using polymerases appropriate for whole genome amplification (WGA) and whole transcriptome amplification (WTA), respectively, to increase the amount of starting template prior to NGS library preparation. WGA and WTA can help obtain more sequencing reads, better coverage, improved sensitivity, and better variant detection from limited sample amounts. Phi29 DNA polymerase is commonly used for WGA because of its high processivity, reduced bias, high fidelity, and ability to synthesize DNA isothermally at a low temperature [3].


2. Library preparation for NGS

After isolation and purification, nucleic acids are prepared so that they can be processed and read by the sequencers. These prepared, ready-to-sequence samples are commonly known as “libraries” because they represent a collection of molecules that are sequenceable. The library preparation procedure may vary depending on the methods and reagents used, but the general steps in library preparation for Illumina systems are as follows:

  • Nucleic acid fragmentation: A nucleic acid sample is fragmented into small pieces so that they can be sequenced in a massively parallel fashion. The optimal range of fragment sizes depends on the sequencers and sequencing applications.
  • Adapter ligation: Adapters are oligonucleotides with sequences that are complementary to the priming oligos on the sequencing chips. The ends of the nucleic acid fragments are ligated with adapters (commonly known as P5 and P7) to enable sequencing. Since adapter sequences are specific to the sequencing platform, Illumina adapters are not interchangeable, for example, with Ion Torrent adapters.
  • Library quantitation: After preparation, a sequencing library represents a pool or collection of DNA fragments with adapters attached to their ends. Prepared libraries must be quantified (and normalized as needed) so that an optimal concentration of molecules to be sequenced is loaded onto the sequencers. This step of quality control helps ensure consistent data output and quality, as well as efficient use of the sequencing chips. Fluorometric spectroscopy and real-time PCR are common methods used for library quantification.
Workflow of NGS library preparation for Illumina systems

Figure 2. Workflow of NGS library preparation for Illumina systems.

Note that additional steps may be performed in different library preparation workflows. For instance, size selection or purification of fragments of desired sizes is a common step to enhance sample quality. Library amplification by PCR is usually performed after adapter ligation, especially when working with a low quantity of starting material. When RNA is used, cDNA synthesis is part of the library preparation workflow. Target enrichment may be performed when sequencing is needed only for a defined set of genes or genomic regions (rather than the whole genome or transcriptome). (Learn more: DNA sequencing library preparation)

Regardless of the steps involved, the final prepared NGS libraries should consist of DNA fragments of desired lengths with adapters at both ends. When relying on a commercial kit for NGS library preparation, look for reagents or protocols that can offer a simple procedure with less hands-on time while still ensuring high-quality libraries with good yield.


3. Clonal amplification and the sequencing reaction

a. Clonal amplification

Prior to the sequencing reactions, DNA libraries undergo clonal amplification. In this process, DNA fragments of the libraries are amplified so that fluorescent signals of single-base incorporation in the subsequent sequencing reaction are strong enough to be detected by the sequencers.

The Illumina platform utilizes solid-phase amplification in which each fragment in the library first anneals to the primers on the sequencing chip (known as the flow cell) via the adapters. Through a series of amplification reactions known as bridge amplification [4] (Figure 3A), each fragment forms a cluster of identical molecules called clonal clusters (Figure 3B); therefore, every cluster represents one primary library molecule. Note that clonal amplification on a patterned flow cell with predefined arrays employs a different method called exclusion amplification (ExAmp) chemistry. The ExAmp technology involves instantaneous amplification of a DNA fragment after binding to the primer on the patterned flow cell, excluding other DNA fragments from forming a polyclonal cluster [5].

This process of clonal amplification should not be confused with library amplification, which is carried out to increase library input before loading onto a flow cell.

Clonal amplification steps

Figure 3. Clonal amplification steps. (A) Bridge amplification. (1) The complementary strand of a DNA fragment in the library is synthesized from the flow cell’s priming oligo. (2) After removal of the original strand, the complementary strand folds over and anneals with the other type of flow cell oligo. A double-stranded bridge is formed after synthesis of its complementary strand. (3) The double-stranded bridge is denatured, forming two single strands attached to the flow cell. (4) The process of bridge amplification repeats, and (5) more clones of double-stranded bridges are formed. (B) Cluster generation. The double-stranded clonal bridges are denatured (only one strand is shown here for simplicity), the reverse strands are removed, and the forward strands remain as clusters for sequencing.

b. Sequencing reaction

The step after clonal amplification is sequencing by synthesis (SBS), in which nucleotides incorporated by a DNA polymerase into the complementary DNA strand of the clonal clusters are detected one base at a time.

The Illumina sequencing technology utilizes fluorescent dye–labeled dNTPs with a reversible terminator to capture fluorescent signals in each cycle, relying on a process called cyclic reversible termination [6] (Figure 4). In each cycle, only one of four fluorescent dNTPs is incorporated by the DNA polymerase, based on complementarity, and then unbound dNTPs are washed away. Images of the clusters are captured after the incorporation of each nucleotide; the emission wavelength and fluorescence intensity of the incorporated nucleotide are measured to identify the base that was incorporated in each cluster during that cycle. After imaging, the fluorescent dye and the terminator are cleaved and released, followed by the next cycle of synthesis, imaging, and deprotection. Since each base is sequenced one cycle at a time, this process is repeated “n” cycles to achieve a read length of “n” bases.

Sequencing by cyclic reversible termination

Figure 4. Sequencing by cyclic reversible termination. For simplicity, sequencing primers are not shown. Note that some Illumina systems may use two-channel and one-channel SBS chemistry, instead of four-channel chemistry (four fluorophore colors) as illustrated in this figure [7].


4. NGS data analysis using bioinformatics

The final step in the NGS workflow is processing, analysis, and interpretation of the sequencing data generated. Bioinformatic tools are used to convert raw sequencing data into meaningful results. As NGS generates gigabases of raw data, the ability and availability of computing power to process and analyze such massive amounts of data is one of the bottlenecks of the workflow.

This step of the NGS workflow can be roughly categorized into three stages as shown in Table 1. Applications and goals of NGS experiments often dictate how the data are processed and analyzed, as well as which bioinformatic tools are used. (Learn more: NGS data analysis)

Table 1. Key steps and processes in sequencing data analysis.

StageUndertaking
Processing: Cleanup of sequencing data
  • Base calling
  • Determination of read numbers and lengths
  • Application of necessary filters (e.g., clusters passing filters)
  • Trimming of adapter sequences
  • Demultiplexing of samples
Analysis: Investigation of sequence relevance, variance, distinctiveness, novelty, etc.
  • Mapping or alignment to a reference sequence
  • Visualization of mapped sequences
  • Removal of duplicate mapped reads (e.g., PCR artifacts)
  • Alignment and assembly of contiguous sequences (de novo sequencing)
  • Determination of strand specificity (strandedness)
  • Genome annotation
  • Detection of sequence/nucleotide variants
  • Uncovering of new transcripts
  • Determination of gene counts
Interpretation: Prediction of gene functions and biological relevance
  • Seeking insights into sequenced genes
  • Finding correlations and implications of found genes
  • Analysis of biological pathways
  • Identification of biomarkers, drug targets, etc.
  • Discovery of new genes, transcripts, splice variants, etc.

In conclusion, NGS is a powerful technique that generates massive amounts of data that can lead to new biological insights. Although its workflow involves a number of processes and considerations, understanding the basic principles of the key steps can help you plan NGS experiments, obtain high-quality data, and achieve meaningful results.


References

Share
 

For Research Use Only. Not for use in diagnostic procedures.