Endogenous control genes are essential in relative gene expression experiments. Therefore, choosing the most appropriate control gene is crucial.

What are endogenous controls, and why are they necessary?

Quantitative PCR is the method of choice for studying how a change in the conditions under which a gene is expressed—such as the addition of a treatment—affects the amount of mRNA it produces. This is usually quoted in terms of ‘fold change’, e.g. if the treated sample produces twice as much mRNA as the untreated sample, the result is a ‘fold change’ of 2.

The quantitative differences in mRNA produced during a qPCR assay do not just depend on gene activity—they also depend on experimental conditions, particularly the initial amount of cDNA. To get a valid result, you need to start with exactly the same amount of cDNA in the treated and untreated samples, and this is difficult to achieve.

Fortunately, this problem has a solution. If you include a second gene known to be unaffected by the treatment in each sample, any difference in the mRNA detected will be the result of changes in starting cDNA concentration. Compare the patterns of gene expression between the second gene and the gene of interest to work out the ‘true’ fold change. This second gene can be termed an endogenous control but is also known as a housekeeping gene, a normalizer, a reference gene, or an internal control gene.

Let’s illustrate this with an example. Suppose you test one gene under two conditions and end up with Ct values of 28.5 in the ‘treated’ sample and 27.5 in the ‘untreated’ sample. This gives a measured difference of 1 between these values (delta Ct). If you knew that the amount of cDNA in each sample was exactly the same, you could calculate the fold change as 2^(delta Ct), and that 2^1=2. You could then conclude that the expression level in the treated sample was twice that in the untreated sample.

But you still can’t tell whether this is a ‘true’ fold change because of differences in sample input, and this is where the endogenous control comes in. You select a control gene that is expressed consistently across all samples in your study, measure its expression level under each condition, and come up with Ct values of 19.5 and 18.5 for the treated and untreated samples, respectively. Here, the delta Ct value for the control would also be 1. So, the control—which has stable expression values—has given you the same delta Ct as your gene of interest. This could imply that the measured two-fold difference in expression levels is caused by a two-fold difference in the initial amount of cDNA in the samples, and is not treatment-related at all.

In relative gene expression, therefore, expression level changes are measured as the difference between delta Ct for the tested gene and delta Ct for the endogenous control: delta delta Ct.

In the previous example:
delta delta Ct = (28.5-27.5) – (19.5-18.5) = 0

You can conclude from this that the treatment has made no difference to the level of gene expression.

So how do you choose an appropriate endogenous control gene?
In the example above, we assume that the endogenous control gene is expressed at a consistent level in all studied conditions, so any change in control gene expression between the treated and untreated samples will be measured in that gene’s delta Ct value, and will contribute to the calculated delta delta Ct. For reliable results, you need to select the correct control.

An endogenous control gene must have stable expression in all samples tested, i.e. the control should not change its expression between treatments, time points or other test conditions. In practice, zero variation is very rare and endogenous control genes are allowed small differences in Ct values of up to 0.5 Ct. Differences at the top end of this range will introduce imprecisions. From our equation, a difference of 0.5 Ct will equate to a fold change of 2^0.5 or 1.41. But if we tried a control gene with a difference of 2 Ct between samples, this would equate to a four-fold change in expression levels, making the gene useless as a control.

Choosing and validating an endogenous control

It is impossible to predict exactly how any gene will behave under a given range of conditions. The best way of selecting the most appropriate control gene for a relative qPCR experiment is to select some candidate genes and determine their expression levels across the range of experimental conditions and treatments. The genes most stably expressed across these conditions will be the most appropriate controls.

It is best practice to evaluate several candidate genes, as the ideal control for each experiment will depend on many variables, including the cell or tissue types involved and the range of conditions to be tested. Certain housekeeping genes that encode proteins required for basic cellular function are typically expressed at constitutive levels in a range of cell types and conditions, including disease states. Although these housekeeping genes can be good candidates for endogenous controls, and are worth considering, the expression of some classical housekeeping genes, like beta-actin (β-Actin) and glyceraldehyde 3-phosphate dehydrogenase (GAPDH), varies considerably between tissue types [1]. It is essential to test housekeeping genes for variability in expression before using them as endogenous controls in gene expression studies.

Genes that code for ribosomal RNA (rRNA) molecules, rather than proteins, are also stably expressed in almost all cell types and can serve as endogenous control candidates.

Thermo Fisher Scientific supplies TaqMan gene expression assays for human and other eukaryotic rRNA and housekeeping genes for use as endogenous controls. If you are working with human samples, your first port of call should probably be the TaqMan endogenous control plate. This standard 96-well plate includes triplicates of 32 stably expressed human genes known to be good control candidates; you are likely to find a control among these that is appropriate for your applications. For a wider variety of assays involving other species, go to taqmancontrols to select ‘Gene Expression’, ‘Controls’ and your species of interest (or ‘All’), and then click 'Search'.

Once you have selected your candidate control genes, test each one for stable expression under your study conditions. You should ensure the methodology you use is exactly the same in each case. We recommend following these steps:

  1. Select experimental conditions that are representative of your study, e.g. a specific range of cell types, treatments or time points.
  2. Purify the RNA from all your samples across different test conditions using the same method.
  3. Quantify the RNA and use the same amount and method for cDNA synthesis.
  4. Test the same volume of cDNA from each candidate control gene across the different experimental conditions in at least triplicate qPCR reactions.
  5. Assess the variability in measured Ct values for each control gene under your chosen conditions, by measuring their standard deviation (SD).

The ideal control gene exhibits stable expression with the least variation in Ct values. This is determined by measuring the SD of the replicate Ct values. The best candidates will be those genes with the lowest SD across all tested conditions.

If your assay reveals several candidate control genes with low variability, choose a control gene with roughly similar expression to your test genes. A significant difference in expression between the test and control genes will lead to poor results in relative gene expression analysis by qPCR.

More than one control?

It is possible that no single endogenous gene will fit your requirements; in this case, use two or more genes in parallel for best results. This approach has been well documented in the literature. One example is a study by Schmid et al. of gene expression in renal biopsies from patients with different kidney diseases [2]. The researchers noted that regulation of housekeeping genes in this tissue made any single one of these genes unreliable as a control and suggested that relating expression to 18S rRNA and cyclophilin A in parallel would yield more reliable results. Interestingly, there are few published studies of gene expression in kidney tissues that used either of these genes as a control.

Multiple controls are also widely used in studies of gene expression in cancer. This technique helps classify tumors into subtypes defined by gene expression patterns; this is often a better predictor of prognosis and treatment response than the site or morphology of the tumor. Lossos et al. published an optimization of qPCR parameters for differential diagnosis of non-Hodgkin’s lymphomas in which two optimum controls were selected from a panel of 11 housekeeping genes [3]. A later study by Ayakannu et al. on endometrial carcinomas [4] selected three different control genes from a similar but expanded gene panel.

It is clear from even these few examples that there is no ‘one size fits all’ solution to choosing a control. Unless you can find a reliable report in the literature of the exact study you are planning, it is best to cast your net widely and test a large panel of candidates. For human studies, the TaqMan™ Array Human Endogenous Control Panel is an excellent place to start.


References
  1. Radonic A, Thulke S, Mackay IM et al. (2004) Guideline to reference gene selection for quantitative real-time PCR. Biochem Biophys Res Commun 313(4):856–862. doi: 10.1016/j.bbrc.2003.11.177.
  2. Schmid H, Cohen CF, Henger A et al. (2003) Validation of endogenous controls for gene expression analysis in microdissected human renal biopsies. Kidney Int 64(1):356–360. doi: 10.1046/j.1523-1755.2003.00074.x.
  3. Lossos IS, Czerwinski DK, Wechser MA et al. (2003) Optimization of quantitative real-time RT-PCR parameters for the study of lymphoid malignancies. Leukemia 17(4):789–795. doi: 10.1038/sj.leu.2402880.
  4. Ayakannu T, Taylor AH, Willets JM et al. (2015) Validation of endogenous control reference genes for normalizing gene expression studies in endometrial carcinoma. Mol Hum Reprod 21(9):723–735. doi: 10.1093/molehr/gav033.

For Research Use Only. Not for use in diagnostic procedures.