“The percent of Guanine and Cytosine, nitrogenous bases present in a DNA, RNA, nucleic acid sequence or primer is known as GC content. The unit to calculate the GC content is usually a percentage (%).“
The human genome comprises four types of nitrogenous bases categorized into purines and pyrimidines by names are Thymine, Guanine, Adenine and Cytosine. The structure of a base + phosphate + sugar is known as ‘nucleotide’.
The DNA is a long chain of nucleotides- a polynucleotide chain. Each base binds with another complementary base with hydrogen bonds. For example, the A forms two hydrogen bonds with the T, vice versa and G forms three hydrogen bonds with the C, vice versa.
Our genome is huge and has many mysteries, the GC-rich regions are one among those, Here in the present article, I will explain the importance of the GC bases in our genome as well as in techniques like PCR or sequencing.
Importance of GC-rich region in our genome:
The genome can broadly be classified into genes and non-coding junk sequences. However, the GC content regions spam both coding and noncoding regions henceforth are important for protein formation as well as gene expression.
The huge regions in the human genome have GC-rich domains often known as “isochores” and took part in constructing some vital genes too. Cytologically, the region consists of the GC base pairs stains darker than the AT-rich regions.
Here is the list of GC content in different organisms:
We have enlisted the GC-content of different organisms which shows that humans have 39.7% of this part. Importantly, it’s present not only in heterochromatin but also in euchromatin and so is an important part of genes too.
Broadly, the higher GC-rich repetitive areas are denoted as ‘CpG island’ that also have a definite role in the development of disease. Huntington’s disease is the classic example of a change in the number of CAG repeats of the HTT gene.
Structurally, the CpG islands are usually located in gene exons, introns, 5’ UTR and 3’ UTR regions and in the non-coding sequences.
Studies show the role of GC- rich regions involved more in DNA bendability, than in thermostability.
More repeats severe the disease. Here are a few examples of triplet repeat expansion disorders and their nucleotide structure.
|Huntington’s disease||CAG||Autosomal dominant||Gene exon|
|Myotonic dystrophy||CTG||Autosomal dominant||3’ UTR|
|Fragile X syndrome||CGG||X linked||5’ UTR|
|Friedreich ataxia||AAG||Autosomal recessive||Gene intron|
Studies also reveal that in plants, especially in monocots, GC- rich regions provide genome functionality and species ecology.
The present regions have an important function in stabilizing the DNA. Higher GC content has higher thermal stability while lower GC content has low thermostability.
Meaning a DNA with more GC content is highly stable due to the presence of more hydrogen bonds, though research shows that the hydrogen bonds do not have a direct impact on the stability of the DNA.
It was also evident that the higher thermal stability helps organisms to survive at a higher temperature, although the hypothesis was proven wrong later on.
What are isochores?
The isochores are the region larger than >300kb in size which has higher, evenly distributed, gene regions in the genome. It is believed that these regions evolved late during the process of evolution and provided heterogeneity to the genome.
In summary, the GC-rich regions have a definite role in gene regulation, gene expression, genome functionality and disease development.
Moreover, the GC-rich regions have pivotal importance in techniques like PCR or DNA sequencing. Unlike the AT-rich regions, the GC content may decrease the specificity and efficiency of the PCR or DNA sequencing.
And therefore additional optimization is advised.
Role of GC-rich regions in PCR and DNA sequencing:
A partial truth said that the GC- rich regions are thermostable and so it takes more energy to break open. So the GC- rich templates have a higher annealing and melting temperature.
It is categorized into “hard to amplify template” as it needs a higher annealing temperature and additional optimization steps.
In the PCR, higher GC templates increase the chances of non-specific bindings and consequently the chances of false-positive results. Care must be taken while selecting the PCR template DNA and designing primers.
The reason behind that is the triple hydrogen bonds between G and C. More hydrogen bonds elevate the energy or temperature requirement for separating the dsDNA.
Besides, elevated G and C nucleotides make it hard for primers to amplify the target DNA. It is advisable to select GC regions between 40 to 60%, ideally 45% while designing primers.
The experiment needs a higher annealing temperature if the GC exceeds 60%, meaning increases the chances of false results. Note that a differently optimized PCR protocol is needed to amplify more GC-rich regions.
The GC regions also have a unique role in DNA sequencing too. According to the linear relation equation, the higher GC-rich regions elevate the buoyant density of dsDNA may lead to false results.
Calculating the GC-rich regions
In order to make a decision on how our primer or template sequence behaves during the PCR and DNA sequencing depend on the number GC nucleotide and therefore we need to calculate it.
Here is the equation:
- (G + C / A + T + G + C) * 100%
If you wish to calculate the ratio of AT/GC, you can use this equation:
- A + T/ G + C
Some of the online tools also can help in doing these which are easy to use, use this link to calculate the GC- content directly:
Applications of GC-rich regions:
Our genome has wide varieties of information and sequences, every different sequence provides different information and has varied utility, for example, the short tandem repeats are useful for DNA fingerprinting.
- The GC- rich sequences are a huge part of gene structure, so useful in gene mapping.
- Analysis of CpG island-rich GC-regions also benefits to study, identify and characterize genetic disorders.
- In the cytogenetic study, GC and AT-rich regions stain differently and form different banding patterns and facilitate copy number variation studies.
- As we talked, the GC-rich sequences also have significant importance in designing PCR primers, PCR assays and DNA sequencing experiments.
Our genome is huge! Plenty of regions and sequences are still unknown to us, we even don’t know their function. GC- rich sequences are such a type that are also less studied, meaning, the functionality is still unclear or less clear.
However, it has importance in amplification and primer designing. If you wish to read other articles related to this topic, you can read them by searching in the search box.
Vinogradov AE. DNA helix: the importance of being GC-rich. Nucleic Acids Res. 2003;31(7):1838-1844. doi:10.1093/nar/gkg296