Next-Generation Sequencing Illumina Workflow–4 Key Steps

Next-generation sequencing (NGS) is a high-throughput sequencing method that enables sequencing of large and complex genomes (e.g., human genome) in a single day. In Illumina NGS systems, high-throughput generation of data is made possible by massively parallel sequencing of nucleic acid samples. The workflow includes isolation of desired nucleic acids, fragmentation of isolated nucleic acids and preparation of samples for the sequencers (library preparation), sequencing reactions, and bioinformatic processing and analysis of sequencing data. This section covers an overview and key considerations for each main step of the workflow, as illustrated in Figure 1.

Next-generation sequencing workflow using Illumina systems

Figure 1. Next-generation sequencing workflow using Illumina systems.

1. Nucleic acid isolation for NGS

Nucleic acid isolation is a crucial first step in the NGS workflow, regardless of whether you are sequencing genomic DNA (gDNA), total RNA, or different RNA types. It is important to select an isolation method or kit that enables proper lysis of the cells and tissue. This will, in turn, help you obtain the yield, purity, and quality needed for subsequent library preparation steps.

Considerations for nucleic acid isolation for NGS include the following to help ensure success in downstream steps.

  • Yield: Nanograms (ng) to micrograms (µg) of DNA or RNA are usually required for library preparation. Therefore, obtaining the maximum possible amount of high-quality nucleic acids, even from limited or archived sources such as cell-free DNA (cfDNA) and formalin-fixed, paraffin-embedded (FFPE) samples, is critical for NGS success.
  • Purity: Isolated DNA or RNA should be free of compounds that can inhibit enzymes during library preparation. Typically, these inhibitors could be reagents from nucleic acid isolation (e.g., phenol, ethanol), or contaminants carried over from biological samples (e.g., heparin, humic acid). Therefore, the isolation method of choice should remove or minimize carryover of these types of contaminants.
  • Quality: The integrity and quality of isolated nucleic acids is another important aspect for NGS success. For example, when working with gDNA, the majority of isolated DNA should be of high molecular weight and intact. When working with RNA, degradation should be minimized during the sample preparation process. Also, isolated RNA should be heterogeneous and representative of the nucleic acid populations in the original sample source. When using FFPE samples, in which DNA and RNA are already fragmented, select appropriate isolation methods or kits to obtain sufficient yield and quality of nucleic acids for sequencing.

Yield, purity, and quality of isolated nucleic acids should be assessed before proceeding to NGS library preparation. The following are methods commonly used for examination of these attributes:

  • UV spectrophotometric assays measure A260, A260:A280 ratio, and A260:A230 ratio to help assess sample purity and yield
  • Fluorometric assays help quantify specific types of nucleic acids (e.g., ssDNA, dsDNA, small RNA)
  • Gel-based or microfluidic electrophoresis helps determine fragment size, distribution, and quantity

In assessing RNA integrity, the RNA integrity number (RIN), obtained using an algorithm and microfluidic-based electrophoresis [1], and the integrity and quality (IQ) score, based on fluorometry [2], can provide quantitative values of the intact RNA population in the isolated sample.

What is purity ratio?

RNA quantitation and qualification by fluorometry

In cases of low quantities of nucleic acids (e.g., when using single cells as the source) isolated DNA and RNA may be amplified using polymerases appropriate for whole genome amplification (WGA) and whole transcriptome amplification (WTA), respectively, to increase the amount of starting template prior to NGS library preparation. WGA and WTA can help obtain more sequencing reads, better coverage, improved sensitivity, and better variant detection from limited sample amounts. Phi29 DNA polymerase is commonly used for WGA because of its high processivity, reduced bias, high fidelity, and ability to synthesize DNA isothermally at a low temperature [3].


2. Library preparation for NGS

After isolation and purification, nucleic acids are prepared so that they can be processed and read by the sequencers. These prepared, ready-to-sequence samples are commonly known as “libraries” because they represent a collection of molecules that are sequenceable. The library preparation procedure may vary depending on the methods and reagents used, but the general steps in library preparation for Illumina systems are as follows:

  • Nucleic acid fragmentation: A nucleic acid sample is fragmented into small pieces so that they can be sequenced in a massively parallel fashion. The optimal range of fragment sizes depends on the sequencers and sequencing applications.
  • Adapter ligation: Adapters are oligonucleotides with sequences that are complementary to the priming oligos on the sequencing chips. The ends of the nucleic acid fragments are ligated with adapters (commonly known as P5 and P7) to enable sequencing. Since adapter sequences are specific to the sequencing platform, Illumina adapters are not interchangeable, for example, with Ion Torrent adapters.
  • Library quantitation: After preparation, a sequencing library represents a pool or collection of DNA fragments with adapters attached to their ends. Prepared libraries must be quantified (and normalized as needed) so that an optimal concentration of molecules to be sequenced is loaded onto the sequencers. This step of quality control helps ensure consistent data output and quality, as well as efficient use of the sequencing chips. Fluorometric spectroscopy and real-time PCR are common methods used for library quantification.
Workflow of NGS library preparation for Illumina systems

Figure 2. Workflow of NGS library preparation for Illumina systems.

Note that additional steps may be performed in different library preparation workflows. For instance, size selection or purification of fragments of desired sizes is a common step to enhance sample quality. Library amplification by PCR is usually performed after adapter ligation, especially when working with a low quantity of starting material. When RNA is used, cDNA synthesis is part of the library preparation workflow. Target enrichment may be performed when sequencing is needed only for a defined set of genes or genomic regions (rather than the whole genome or transcriptome). (Learn more: DNA sequencing library preparation)

Regardless of the steps involved, the final prepared NGS libraries should consist of DNA fragments of desired lengths with adapters at both ends. When relying on a commercial kit for NGS library preparation, look for reagents or protocols that can offer a simple procedure with less hands-on time while still ensuring high-quality libraries with good yield.


3. Clonal amplification and the sequencing reaction

a. Clonal amplification

Prior to the sequencing reactions, DNA libraries undergo clonal amplification. In this process, DNA fragments of the libraries are amplified so that fluorescent signals of single-base incorporation in the subsequent sequencing reaction are strong enough to be detected by the sequencers.

The Illumina platform utilizes solid-phase amplification in which each fragment in the library first anneals to the primers on the sequencing chip (known as the flow cell) via the adapters. Through a series of amplification reactions known as bridge amplification [4] (Figure 3A), each fragment forms a cluster of identical molecules called clonal clusters (Figure 3B); therefore, every cluster represents one primary library molecule. Note that clonal amplification on a patterned flow cell with predefined arrays employs a different method called exclusion amplification (ExAmp) chemistry. The ExAmp technology involves instantaneous amplification of a DNA fragment after binding to the primer on the patterned flow cell, excluding other DNA fragments from forming a polyclonal cluster [5].

This process of clonal amplification should not be confused with library amplification, which is carried out to increase library input before loading onto a flow cell.

Clonal amplification steps

Figure 3. Clonal amplification steps. (A) Bridge amplification. (1) The complementary strand of a DNA fragment in the library is synthesized from the flow cell’s priming oligo. (2) After removal of the original strand, the complementary strand folds over and anneals with the other type of flow cell oligo. A double-stranded bridge is formed after synthesis of its complementary strand. (3) The double-stranded bridge is denatured, forming two single strands attached to the flow cell. (4) The process of bridge amplification repeats, and (5) more clones of double-stranded bridges are formed. (B) Cluster generation. The double-stranded clonal bridges are denatured (only one strand is shown here for simplicity), the reverse strands are removed, and the forward strands remain as clusters for sequencing.

b. Sequencing reaction

The step after clonal amplification is sequencing by synthesis (SBS), in which nucleotides incorporated by a DNA polymerase into the complementary DNA strand of the clonal clusters are detected one base at a time.

The Illumina sequencing technology utilizes fluorescent dye–labeled dNTPs with a reversible terminator to capture fluorescent signals in each cycle, relying on a process called cyclic reversible termination [6] (Figure 4). In each cycle, only one of four fluorescent dNTPs is incorporated by the DNA polymerase, based on complementarity, and then unbound dNTPs are washed away. Images of the clusters are captured after the incorporation of each nucleotide; the emission wavelength and fluorescence intensity of the incorporated nucleotide are measured to identify the base that was incorporated in each cluster during that cycle. After imaging, the fluorescent dye and the terminator are cleaved and released, followed by the next cycle of synthesis, imaging, and deprotection. Since each base is sequenced one cycle at a time, this process is repeated “n” cycles to achieve a read length of “n” bases.

Sequencing by cyclic reversible termination

Figure 4. Sequencing by cyclic reversible termination. For simplicity, sequencing primers are not shown. Note that some Illumina systems may use two-channel and one-channel SBS chemistry, instead of four-channel chemistry (four fluorophore colors) as illustrated in this figure [7].


4. NGS data analysis using bioinformatics

The final step in the NGS workflow is processing, analysis, and interpretation of the sequencing data generated. Bioinformatic tools are used to convert raw sequencing data into meaningful results. As NGS generates gigabases of raw data, the ability and availability of computing power to process and analyze such massive amounts of data is one of the bottlenecks of the workflow.

This step of the NGS workflow can be roughly categorized into three stages as shown in Table 1. Applications and goals of NGS experiments often dictate how the data are processed and analyzed, as well as which bioinformatic tools are used. (Learn more: NGS data analysis)

Table 1. Key steps and processes in sequencing data analysis.

StageUndertaking
Processing: Cleanup of sequencing data
  • Base calling
  • Determination of read numbers and lengths
  • Application of necessary filters (e.g., clusters passing filters)
  • Trimming of adapter sequences
  • Demultiplexing of samples
Analysis: Investigation of sequence relevance, variance, distinctiveness, novelty, etc.
  • Mapping or alignment to a reference sequence
  • Visualization of mapped sequences
  • Removal of duplicate mapped reads (e.g., PCR artifacts)
  • Alignment and assembly of contiguous sequences (de novo sequencing)
  • Determination of strand specificity (strandedness)
  • Genome annotation
  • Detection of sequence/nucleotide variants
  • Uncovering of new transcripts
  • Determination of gene counts
Interpretation: Prediction of gene functions and biological relevance
  • Seeking insights into sequenced genes
  • Finding correlations and implications of found genes
  • Analysis of biological pathways
  • Identification of biomarkers, drug targets, etc.
  • Discovery of new genes, transcripts, splice variants, etc.

In conclusion, NGS is a powerful technique that generates massive amounts of data that can lead to new biological insights. Although its workflow involves a number of processes and considerations, understanding the basic principles of the key steps can help you plan NGS experiments, obtain high-quality data, and achieve meaningful results.


References

Share
 

For Research Use Only. Not for use in diagnostic procedures.