A single nucleotide variation (deletion/ addition) occurred at the specific location into the genome is called an SNP, single nucleotide polymorphism often abbreviated as SNP, snip or snips.
The human genome is made up of DNA, a long chain of nitrogenous bases; adenine, thymine, cytosine and guanine called polynucleotide chain.
The major portion of the human genome is non-coding, however, the coding segment encodes different proteins for different functions.
Read more on DNA: DNA story: The structure and function of DNA
Any alteration into the DNA which causes a change in the genotype is a mutation. The mutation may be an addition, deletion, duplication, inversion or translocation in a DNA sequence.
SNP is one of a kind of genetic mutation/ alteration which arises due to addition or deletion of a single nucleotide into the DNA sequence.
See the image below,
Due to the adverse environmental conditions, stress and unhealthy lifestyle many single nucleotide alterations are inserted or deleted in a genome.
In a population, if a specific SNP, at specific location occurred for more than 1% then it is characterised as true SNP.
In the present article, we will briefly introduce you about the SNP and its prevalence in our genome.
- What is SNP?
- How different alleles originated?
- Naming the SNP
- Applications of SNP
- SNP databases
- SNP genotyping
Now, let us start with some basics,
What is SNP (single nucleotide polymorphism)?
In a simple language, we can say, a single nucleotide change in a DNA sequence is called as SNP.
Nucleotide= Base+ suger+ phosphate,
As per the data, the majority of the SNPs occur in the non-coding regions of the genome but some SNP in the coding sequences as well as in the non-coding regions are responsible for some of the inherited genetic disorders.
Thalassemia, sickle cell anaemia, cystic fibrosis etc are the best examples of it. The abnormal conditions in these disorders are originated due to a single SNP.
Based on the origin of the SNP, it can be broadly categorised into the coding region SNPs and non-coding region SNPs. However, some of the SNPs such as SNPs in the beta-globin gene also occur in the intervening regions too.
The coding regions SNPs are further categorised into synonymous SNPs and non-synonymous SNPs.
The non-synonymous SNPs affects the protein-coding sequences while the synonymous SNPs do not affect the protein-coding sequences.
The non-synonymous SNPs can be a missense mutation or nonsense mutation.
Hypothetically, it is believed that SNP in the non-coding regions does not have any specific effect on the phenotype. But it is not true.
The gene expression, gene splicing and transcriptional regulation activities are governed by the non-coding DNA.
Therefore, SNP in the noncoding regions is more pathogenic than the coding region. Interestingly, SNPs are more frequent in the non-coding region as compared with coding regions.
Till date, scientists have found more than 100 million SNPs from the population which are associated with different diseases.
On an average, an SNP occurs at every 1000 nucleotides thus approximately 4 to 5 million SNPs may present in an individual’s genome.
As we discussed in the earlier segment, most SNPs are present in the non-coding regions or between the regions of genes thus it does not have a direct role in the disease development or does not have directly affects one’s health.
Different alleles arise due to SNPs, One SNP results in the origination of two alleles of one particular gene.
“The alternative forms of a gene are called alleles.”
How different alleles originated?
“A new variation, a new allele.”
Suppose in a particular DNA sequence, if nucleotide A is present at position 5 in the 82% population. At the same position nucleotide, C is present instead of A in the 9% population and G in place of C in 8% population.
This indicates that in the same gene in the same population three different alleles are present. See the image below,
The three different varieties of the alleles are occurred due to the SNP (named as c.5A>C/G). Therefore, we can say that the single nucleotide polymorphism is responsible for the origination of new variations.
Naming the SNPs:
Different nomenclature methods for the SNP exist for different databases.
For example see this SNP, c.122A>T.
The meaning of the SNP is, c- codon, at the codon 122, the nucleotide A (adenine) is replaced with the T (thymine).
Which indicates, the wild type allele contains A at codon 122 while the mutant allele contains T in place of A.
Some other database also has a nomenclature method like rs53576.
Applications of SNPs:
The SNP is one of the best genetic markers for genomic and genetic research.
It is used in the linkage disequilibrium studies and finding the disease probability.
The SNP is a kind of a genetic marker used in the determination of disease or trait and its association with the genetic variations.
It is also used in the haplotype mapping.
Some SNPs are also directly responsible for genetic disease, such diseases can be diagnosed by analysing the SNP.
In modern genetics, the application of SNPs is tremendously increased for the personalized genetic analysis.
It is used in the Genome-Wide association studies for identifying the association of SNPs with the disease.
Some of the important SNP databases are enlisted here,
The dbSNP database of NCBI is one of the best databases for searching the single base variations.
It contains information for both common variations occurs in the genome and clinical mutations or variations.
Furthermore, it has a database of related publications and other minor deletions or insertion.
- Link: dbSNP
(rs1815739, paste this SNP in the search box of the link given above and see the results what data its shows).
Kaviar, Known VARiants are another database of the SNPs contains 162 million SNP or SNV sites from the genome.
Although, the Kaviar database does not contain the SNP information related to cancer.
- Link: Kaviar
SNPedia is another non-official SNP database governed by Wikipedia.
- Link: SNPedia
The SNPs are one of the most important genetic markers used by researchers to study the variation in the human genome because it has so many applications.
Several methods for SNP genotyping is enlisted here can be used for the SNP analysis:
- PCR-RFL based genotyping
- SSCP- single-strand conformation polymorphism
- Restriction fragment length polymorphism
- DNA sequencing
- DNA microarray
- SNP chip analysis
- Capillary electrophoresis
- Single base extension
Non-genomic tool-based methods:
- Denaturing HPLC
- Mass spectroscopy
- Hybridization assay
- Electrochemical analysis
Now, let’s discuss some of the methods how it is used for the SNP analysis:
Methods for detection of known SNPs:
The PCR RFLP method of SNP detection is one of the traditional methods of detection. Restriction fragment length polymorphism is a restriction enzyme-based polymorphism detection method. Yet, only known SNP can be detected by RFLP.
Read further: What is Restriction Digestion and how to do it?
The method is a combination of restriction digestion and PCR amplification. Here the amplified fragment of our interest is further digested with the help of the specific restriction endonuclease.
The endonuclease cuts or cleaves the DNA at its specific recognition site. If the sequence contains the restriction site, it will cleaved otherwise remains uncut.
DNA microarray/ SNP microarray:
Microarray is a method used for the detection of known mutations, SNP or copy number variation.
Millions of mutations or preciously called the copy number variation can be detected using the SNP array.
Briefly, in the microarray, the SNP specific oligos are immobilized on the glass slide and hybridization can be done on it.
If the oligo sequence finds the complementary sequence it will hybridize otherwise removed by washing.
The scanner scans the pattern of hybridization and based on that the SNP results will be displayed.
Method for unknown SNP detection:
DNA sequencing is one of the best method for the detection of any type of unknown mutations present in a gene. And so the SNP too.
In fact, the whole genome sequencing facilitates the sequencing of the entire genome.
Any of the new SNP originated in the population can be identified using the DNA sequencing method.
Even, using the techniques like the next-generation sequencing, one can identify many SNPs at once and even compare it with other samples too, in a single run.
Read more on DNA sequencing: DNA Sequencing: History, Steps, Methods, Applications And Limitations.
The genome-wide association study, one of the emerging methods is entirely based on the SNP analysis. Millions of SNP are scanned from the population. The common SNPs associated with the disease are identified and tagged for personalized genomic analysis and prediction of disease status.
Read our article of GWAS: An Introduction To Genome-Wide Association Study (GWAS).
- What are single nucleotide polymorphisms (SNPs)? by Genetic Home Reference.
- TM Gabor and K Ian et. al., 1999. “A general approach to single-nucleotide polymorphism discovery. Nature Genetics 23; 452-456. Link: A general approach to single-nucleotide polymorphism discovery.