A genome is an organism’s entire single set of DNA which have all the information need for an organism to grow”.


Or

“A genome of us is made up of DNA having all the information for survival. The entire set of a DNA present in a cell is called a genome.” 

More precisely, 

A haploid set of the DNA present in a cell is called a genome. 

The genome of us is made up of DNA- a basic building block of life and manufactured of nitrogenous bases, phosphate and sugar.  

The DNA is located on the chromosomes- a complex network of DNA and proteins (mainly histones). 23 pairs of chromosomes are present in somatic cells with a pair of sex chromosomes.

The human genome contains 3.2 billion base pairs. However, the size of the genome varies from species to species. 

In the present article, we will cover all the information on the genome. 

Key topics: 

  • What is a genome? 
    • Definition 
    • Structure 
      • Coding sequences of a genome 
      • Non-coding sequences of a genome
      • Euchromatin and heterochromatin regions
    • Function
    • Genomics
  • Conclusion

What is a genome? 

Total DNA in a cell is a genome. 

Let’s understand it by taking an example. The genome is just like an entire cookbook having all the information on how to cook different recipes along with the list and information of different ingredients. 

Each chapter in a cookbook dedicatedly contains information of one particular recipe and related ingredients. Similarly, the entire genome is just like a cookbook contains all the information for coding and regulating gene expression.

Each gene from the entire genome contains information for encoding different proteins. Further, some non-coding sequences also possess information for gene regulation. 

The eukaryotic organism is made up of billions of cells, a group of cells makes tissue, tissues combinedly form organs for performing different functions, organs build the body. 

All the information of how cells divide, how it forms,  how it died and how it functions, are encoded in the DNA. 

Different genes encode different proteins for performing different functions. For example, some genes regulate cell division while some genes control apoptosis- cell death.  

Genes are functional of a piece of DNA because not all the DNA are capable of encoding proteins. 

Read more on genes and DNA:

  1. What is a gene?
  2. The structure and function of DNA.

Table: Some organisms their genome size and the number of chromosomes.

Name of the species  Genome size in (Mb) Number of chromosomes(n)
Human  ~3200 23
Dog  2500 39
Mouse  2600 20
Rat  2800 21
Sheep  3000 27
cat 3000 19
baboon 3100 21
pig 3000 19
Cow  3000 30
Horse  2700 32

By interacting with the histone proteins, categorised into- H1, H2A, H2B, H3 and H4, DNA coiled up into a structure called chromatid, four different chromatid forms a chromosome

A single chromosome is a single part of an entire genome (entire 23 chromosomes) made up of arms, centromere and telomeres. 

Important genes are located on two arms of a chromosome, denoted as “p arm” and “q arm” while the telomere has non-coding repeated sequences called satellite DNA. 

The telomeric DNA sequences protect the gene coding segment of a chromosome from the end replication problem. The entire set of DNA- the genome is present within the membrane-bounded structure known as the nucleus. 

However, some DNA is also present in a cytoplasm either chloroplast DNA/ mitochondrial DNA in eukaryotes or plasmid DNA in prokaryotes. 

That DNA molecules are circular or linear ins structure but scientists do not consider it has genomic DNA because of the presence of their own replication and transcription machinery.  

Scientists had sequenced the entire genome of us in a projected named Human genome project- completed in 2003. In addition to this, the genome of different other organisms was also sequenced and compared with ours. The project was started by U. S department of energy and the national institute of health.  

The human genome contains 3.2 billion base pairs. It provides all the information to an organism required to function and for survival. Further, it stores information and inherited to new cells and offsprings. 

Approximately, 20,500 functional genes are identified and sequenced through the project. 

From the HGP, scientists have constructed a database of the genetic map, Physical map, entire DNA sequence data, human sequencing variation map and other information of genes and their functions. 

The location of genes on a chromosome.

The location of genes on a chromosome.

Fun fact: 

If we printed a total of 3.2 billion basepair letters on a paper, 500 pages notebook can be filled with it. 

If we make a chain of the all the bases it covers approximately ~3,000km. 

Definition:

“Entire DNA present in a cell of an organism is called a genome which inherited information from one generation to another and regulates gene expression.”

Or in simple language, we can say, 

 “A genome is information storage and distribution unit having all the information on how the organism will grow and develop.”

The genome size of some well-known organisms.

Structure of genome: 

Structurally, the genome can be divided into two broader categories: 

  1. Coding DNA sequences 
  2. Non-coding DNA sequence

Coding DNA sequences: 

The genome of a prokaryote is flat, contains only a few important genes and less non-coding sequence. Genes are structurally also not similar to the eukaryotic genes as well. 

On the other hand, the eukaryotic genome is different. 

Unlike the prokaryotes, the major portion of the eukaryotic genome is made up of non-coding “so-called” junk DNA. 

Moreover, the genes are made up of exons and introns.

The unmethylated DNA sequences which make protein having introns and exon are called coding DNA sequence or a gene. 

Less portion of the eukaryotic genome can encode proteins especially in humans, only 2 to 3% of the genome is functionally protein-coding sequences. 

Although the ration varies from organism to organism. number of genes in a different organism is given into the table below, 

Species  Name  Number of genes 
Homo sapiens Human  21,000
E. coli  Bacteria  4,200 
Oryza sativa  Rice  38,000
Daphnia Pulex  Water flea  31,000
Gallus gallus  Chicken  17,000

The coding region is function DNA piece called genes has a couple of important characteristics. 

First, the repetitive DNA portion in a gene is less or nearly zero because the chance of replication error is very high in the repeated DNA sequences. 

The second most important property of a gene is unmethylated DNA sequence. DNA methylation prohibits the transcription as enzymes can not function, once methyl group is added. 

The third most important characteristic of a gene is the packaging of genes on chromosomes. 

Well, non-coding and coding DNA sequences even arrange differently on chromosomes. 

The coding region “so-called” gene-rich region is more loosely packed as compared to non-coding DNA. Thus it allows enzymes to function properly. 

Furthermore, the genes or the coding- DNA are located on the “arms” of chromosomes.

Another important characteristic of a gene is its structure- as it contains introns and exons. The introns are removed from the final transcript and exons combinedly form a protein product. 

Importantly, the length of the gene is also one of the crucial factors for protein formation. Larger the gene more chance of replication error results in a mutation.

Thus most of the genes are shorter and contains fewer exons. 

Though 20,000 to 21,000 genes are identified till date, many more transcribed sequence with unknown function exists in our genome. 

Non-coding DNA sequence: 

Now, this is very important to understand. 

Although this portion of the genome can not form any protein, it is very important for our cell to survive. 

Approximately 97 to 98% of the human genome is made up of the non-coding, repetitive junk DNA. 

Here I am going to enlist some of the important types of non-coding DNA sequences present in our genome in this segment of the article: 

Repeated DNA sequence: 

Undoubtedly, the major portion of the human genome is made up of the repeated DNA sequence located on the centromeric and telomeric region of chromosomes often known as the microsatellite or minisatellite. 

Fun fact: 

Human chromosomes are protected by the telomeres which have repeat sequence of TTAGGG. 

In addition to this, the length of the telomere is directly proportional to the age of the person. Longer the telomere, longer the age. 

Related article: Role of Telomeres in ageing.

Tandem repeats: 

The tandem repeats are located one after another in the genome, also of two types. Short tandem repeats and long tandem repeats. 

The short tandem repeats are made up of the 1 to 6 nucleotide long unit while the long tandem repeats are made up of 10 to 50 nucleotide long units. 

The short tandem repeats are known as STR while the long tandem repeats are known are the VNTR- variable number of tandem repeats. 

Both types of non-coding DNA sequences is a very important marker used in the genomic and genetic studies. 

STR or VNTR is one of the unique property of a person, no two people in the world have the same VNTR or STR profile. 

Thus this type of repeated DNA sequences is used in the verification of individual, DNA fingerprinting and paternity verification. 

We have covered an entire article on STR hence we are not discussing it in detail here. 

Related article: Short Tandem Repeats (STRs): A Secret of Every DNA Test.

One important point we have to discuss about the repeated DNA sequences is that, even though it can not code for any protein, it can cause some inherited genetic disorders too. 

For example, Huntington’s disease occurs due to the abnormal expansion of the CAG repeat. 

Transposable elements: 

As the name suggests, the transposable elements often called s the transposons have the power to change location within a genome, although the DNA sequences are not mobile in nature. 

“the transposable elements are the mobile genetic elements- non-coding DNA that moves from one location to another.”

We had covered an entire series of articles on transposons, therefore we are not discussing it in depth here. You can read all the article here: 

Category: Transposons

The transposons are either retrotransposons or DNA transposons, the retrotransposons transcribed into the RNA and moves to another location. 

Based on the terminal repeats present on both side of the retrotransposons, it is divided into long terminal repeats (LINEs) or short interspersed elements (SINEs). Majority of the transposons present in us are retrotransposon types. 

Other types of transposons which are found in bacteria and other prokaryotes are DNA transposons. 

Terminal inverted repeats are the main characteristic of the transposons, the DNA transposons encode enzyme transposase which recognises the terminal sequences for the mechanism of transposition. 

In addition to this, the transposons are also categorised based on the mechanism of transposition, either cut and paste transposons or copy and paste transposons.

We have covered a dedicated article on each type of transposons. Read in the category give above.

The transposons play an important role in creating new variations in nature and thus it is believed that, in past, transposons are one of the forces which drive the evolution. 

Besides this, once the transposons move to a new location, on in between the gene, it dysregulates or inhibits the function of a gene or fuses two genes and creates a new genotype. 

This is the reason, transposons like non-coding DNA sequences are a very important part of our genome. But!!

Transposons are inactive in eukaryotes before thousands of yours.

Methylated DNA sequences: 

Yet another type of non-coding DNA sequences are the methylated DNA sequences in our genome. 

A methyl group binds to the CpG rich region of the genome and deactivates the gene function or makes the DNA sequence non-functional. 

Tightly packed heterochromatin region of the genome is methylated rich region, mostly inactive. 

Smaller RNAs: 

Smaller double-stranded or single-stranded RNA molecules play an important role in the regulation of gene expression. 

Called microRNAs, encoded by some of the genes but can not able to form a protein thus this type of DNAs even though able to transcribe, are categorised in the non-coding type of DNA sequences. 

Those microRNAs play a crucial role in RNA interference mechanism in which it destroys some of the pathogenic viral RNAs other endogenous mRNAs and regulates gene expression. 

Again, we have covered an entire series on different types of RNA and microRNAs. You can read more about it here: microRNA (miRNA) and Gene Regulation.

Besides this, promoters, enhancers, suppressors, insulators, locus control regions, core and proximal promoter sequences are also categorised into the non-coding DNA sequences as well. 

Summary of the non-coding DNA: 

Type of non-coding DNA  Description 
Satellite DNA  Microsatellite and minisatellites are located on the centromere and telomeric region of the chromosome.
Transposons  Mobile genetic elements located throughout the entire genome and can move from one location to another. 
Introns Introns are present in a gene within the exons and are removed prior to mRNA formation. 
Telomeres  The telomeric end of the chromosomes are made up of the six nucleotide repeat sequence protects the chromosome from the end replication problem. 
microRNAs microRNAs are transcribed from the DNA but can not form protein and therefore it is categorised into the non-coding DNA. 
Regulator elements  Other regulatory elements help in replication and transcription. 

Euchromatin and heterochromatin regions: 

Euchromatin regions are lightly packed DNA sequences mostly, gene-rich regions while the heterochromatin regions are tightly packed non-coding DNA sequences.
During the cytological analysis, when we perform GTG banding, different bands of light and dark pinkish-blue regions are observed. The darker portions or bands are the heterochromatin, tightly packed regions.
while the light coloured regions are euchromatin regions. See the image below,

The image of Giemsa banded chromosomes directly taken from the microscope.

The image of Giemsa banded chromosomes directly taken from the microscope.

The function of a genome: 

The genome is just like a hard disk of our computer. 

A computer hard disk stores all the data and crucial information for the computer on how it works and how to perform different functions for a computer. 

Similarly, the genome stores all the information for an organism- for its growth, metabolism, development, reproduction, etc. 

The computer runs on the binary language of 0 and 1, the genome stores information on A, T, G, and C nitrogenous bases and inherited in the same manner. 

 Genes within genome encode different proteins for different cells to function properly while the non-coding sequences such as microRNAs help to regulates the gene expression. 

Based on that, the genome can be divided into three categories: 

  • Coding gene sequences
  • Regulatory elements
  • Maintenance elements

Coding elements or coding genes are unmethylated loosely packed DNA sequences encode a protein. 

Regulatory elements regulate gene expression and in which amount the protein forms. Enhancers, promoters, suppressor and insulators are some of the regulatory elements found in the genome. 

Maintenance elements are the sequences that help is DNA repair and maintaining it. Origin of replication, telomeres and centromeres of a chromosome are some of the examples of maintenance elements. 

How the genome functions? 

Our cells developed different pathways for doing different processes related to DNA. 

Through the replication, the DNA becomes doubles and inherited to the daughter cells. An exact copy of the entire genome is stored in another cell nucleus. 

Through the process of transcription, mRNA is formed from the DNA. The mRNA has all the coding information, the entire mRNA set of the genome is called transcriptomes. 

By analysing the transcriptomes or entire set of mRNA, the amount of total gene expression can be determined. 

 Through the process of translation, a chain of amino acid is formed from the mRNA in the cytoplasm.

The entire process is known as “central dogma” which is regulated by the regulatory elements. 

Graphical illustration of the process of transcription and translation.

Graphical illustration of the process of transcription and translation. mRNA is formed from the DNA through transcription while a chain of amino acid translated from the mRNA.

Sometimes, DNA polymerase inserts the wrong nucleotide or some external factor damages our DNA, it happens! 

DNA repair pathway helps to seal those damaged gaps or repairing the wrong mismatched nucleotides. 

Non-homologous end-joining and direct DNA repair are the two DNA repair pathway by which our DNA is maintained in proper order. 

Interestingly,

“Change in the nucleotide sequence or alteration in the DNA structure in called as mutation.” 

Cis and trans acting elements of the genome: 

Cis-Acting elements are the type of non-coding DNA sequences that regulates the level of gene expression. Remember, the cis-acting elements are actually the non-coding sequences. 

On the other side, the trans-acting elements are the proteins, enzymes and transcriptional factor, act on the cis-elements that actually encoded by some other genes helps to control gene expression. 

Thus the trans-acting elements can work on different cis-acting elements of a different or single gene. 

Promoters, enhancers, silencers, repressions and insulators are the type of cis-acting elements. 

Promoter plays an important role in gene regulation, we will cover a dedicated article on promoters in some other segments. 

Genomics: 

Now coming to another important point related to the genome- the genomics. 

A gene can be easily be studied but studying the entire genome is a tedious, laborious and time-consuming task. 

The study of gene or DNA sequence is called as genetics while the study of the entire genome and its function is called genomics. 

Read our amazing article related to it: “Genome Vs Gene”, An Unusual Comparison.

The genome of eukaryotes or higher eukaryotes are more complicated and larger than the prokaryotes thus studying the entire genome of an organism is quite difficult. 

However, some techniques such as DNA sequencing and DNA Microarray have the power to analyse the entire genome and therefore used in the genome-wide association studies

Whole-genome sequencing: 

Unlike the sanger sequencing whole genome sequencing can analyse the whole genome- whole-genome shotgun sequencing, high throughput DNA sequencing and next-generation DNA sequencing method are three of the best method used to do so. 

Briefly for doing that, first, the entire genome is cleaved or fragmented into smaller fragments using the restriction digestion and created the library of fragments.  

In the next step, each fragment is ligated with the known adaptors and send for DNA sequencing. 

Each fragment is sequenced multiple times and “contings” are generated. 

Finally, the fragments are analysed and arranged back on each chromosome and the entire sequence is generated. 

Note: this is the broader overview, the method, principle and chemistry of sequencing varies based on the selected method. 

What do we get from the whole genome sequencing? 

A gene sequencing will tell us about sequence variations in a gene while in a genome sequencing variation such as SNPs, copy number variation, deletion, duplications and other alteration is determined. 

New mutations or alteration are also identified using it which is beneficial in the genome-wide association studies. 

Related article: 3 Of The Best Genome Sequencing Methods.

DNA microarray: 

DNA microarray is a state of the art technology works based on the mechanism of hybridization. 

In microarray, especially whole chromosome microarray, all the mutations, alterations or copy number variation are screened in a single assay- on a chip. 

However, it is dedicatedly working only for the known mutation, the new alteration can not be determined. 

The method is used majorly for comparing alterations between two genome or more than two genomes thus new variation can be established using it. 

Further, gene expression can also be determined by it. 

Related article: Genome-On-A-Chip: DNA Microarray.

Conclusion:

Although we have sequenced the entire genome, we have less information about it. Our genome is a more complicated thing than we think.

Till date, we have just identified only several thousands genes but scientists are constantly studying our genome to solve the mysteries of life.