“The process of DNA to protein formation occurs via mRNA intermediate through the process of transcription and translation, collectively called gene expression.”


Four letters called “bases”- adenine, thymine, cytosine and guanine creates an entire story for life by forming different variants of genes in different organisms and because of that we are different. 

The entire process of central dogma is based on encoding and decoding of information. 

The information for protein formation and how different amino acid chains are formed from a gene are encoded in the mRNA through the transcription.

While the information is decoded in the form of amino acid and “read” by the ribosomal assembly to manufacture protein. 

The information encoding process is done by copying and inheriting DNA from one cell to another and from one generation to another called replication. 

While the information decoding mechanism is called gene expression- a collection of transcription and translation. 

In the present article, we will explain to you the entire process of encoding and decoding the information needed to survive for us, the journey of DNA to protein formation. 

  • Replication- encoding the information 
  • Transcription- a readable form of information 
  • Translation- encoding the information

A DNA is a double-stranded polynucleotide chain contains information for life to survive. A DNA- made up of A, T, G and C encode different proteins that help in the regulation of cellular activity and different function. 

A protein is a long chain of amino acids called polypeptide chain made up of the triplet codon (we will explain to you what the triplet codon is in the upcoming part of this article). 

A protein or a chain of amino acid is formed via the intermediate RNA molecule called mRNA. 

But before that, the DNA copied by the process called replication in which using the DNA polymerase, an exact copy of DNA is formed and inherited in daughter cells. 

The entire process of replication, transcription and translation is collectively called as the process of the central dogma. 

The central dogma process is must require to maintain and perform different functions in a cell. 

The entire process of cell central dogma- replication, transcription and translation.

mRNA synthesis via transcription: 

The very first step in the protein formation immediately after the completion of replication is transcription. 

Transcription is a process in which using the RNA polymerase, a complementary mRNA or messenger RNA is tailored from the DNA. 

The RNA is different in many aspects from DNA, one of them is the bases; the DNA contains A, T, C and G while the RNA contains the Uracil instead of thymine. 

Want more differences between DNA and RNA? Read this article: DNA vs RNA: Differences and similarities. 

Here, through the process of transcription, the information encoded into the DNA is transferred into the mRNA during which, the unnecessary regions of DNA such as introns are removed. 

Before the final version of mRNA, the pre-mRNA is formed from the DNA using the RNA polymerase II enzyme. 

The RNA polymerase is a class of nucleic acid synthesis enzymes which specifically synthesised RNA from the DNA. 

The pre- mRNA is processed and converted into the final transcript called mature mRNA. 

The process of transcription.

Here not all the DNA present in our genome is capable of doing it, only 2% of the entire genomic DNA is competent in forming a protein.

That 2 % of DNA is called genes, a functional unit of the genome having the capacity to form a protein (Gene). 

RNA splicing is a process during which the intervening non-coding sequences from a gene is removed and all the protein-coding exons are gathered, the final version of the RNA after transcription is the single-stranded, messenger RNA called transcript or mRNA transcript.

“The entire set of coding mRNA present in a cell is called transcriptomes.”

Now our first step of DNA to protein is completed- an mRNA transcript is formed in the nucleus of a cell. 

For the next step of gene expression, the mRNA moves from the nucleus to the cytoplasm. 

Related topic: RNA: Structure and Function.

Amino acid chain formation via translation: 

A triplet codon from the mRNA is “read” during the translation and amino acid is formed. The triplet codon is orderly “read” and a long chain of amino acid is formed called a protein. 

Each different triplet codons encode different amino acid, the chart of codon and specific amino acid is given in the table below, 

The table of amino acid and related triplet codon. Image courtesy: Translation: DNA to protein by nature (https://www.nature.com/scitable/topicpage/translation-dna-to-mrna-to-protein-393/#)

Now our mRNA transcript formed during the transcription have some important characteristics- an RNA coding region is located downstream to the promoter region, the promoter region is located upstream towards the 3’ OH end of a gene. 

The poly-A tail is located on the 3’ end and untranslated terminal sequences are located on the 5’ end of the mRNA. 

Interesting, untranslated region (UTR) is not actually involved in the protein formation but it serves a recognition site for different proteins and enzymes involved in the translation. 

The UTR is located between the first codon and the start codon, serves binding site for different proteins and translational factors. 

Let’s be a little more technical, 

The UTR contains the Kozak box sequences which are the binding site for ribosomal protein.

The eukaryotic UTR sequences are longer than the prokaryotic UTRs.   

The mRNA also have the start codon- AUG and stop codon at which the process of translation stops. The stop codons are UAA, UAG and UGA. 

In the next step, the mRNA runs towards the cytoplasmic ribosome- a membrane-bounded organelle helps in protein formation. 

The eukaryotic ribosome is different from the prokaryotic one. The eukaryotic ribosome is made up of the two separate subunits- 50S (larger subunit) and 30S(smaller subunit), present separately in the cytoplasm. 

Once the mRNA binds to the 50S ribosome, immediately the 30S subunit binds to it. 

The larger subunit of the ribosome provides three different sites for performing the translation- A (amino acid site), P (polypeptide site) and E (Exit site). 

The ribosome assembly has proteins and other RNA molecules such as tRNA and rRNA required to complete the translation. 

Once the complete complex is formed, immediately the initiation factors (IF1, IF2 and IF3)  and tRNA starts the translation. 

A pre-initiation complex is formed, the methionine containing tRNA binds to the mRNA at the start codon (because our start codon is AUG).   

The process of translation: assembling of a larger and smaller subunit of the ribosome on mRNA and insertion of amino acid through tRNA.

The tRNA also is known as transfer RNA simply helps in transferring amino acid. One end of the tRNA reads the triplet codon and another end adds the complementary amino acid on growing polypeptide chain. 

An anticodon end of the tRNA has anticodon complementary to the triplet codon of our mRNA. The other end of the tRNA has the specific amino acid related to the anticodon. 

In the final step, rRNA or ribosomal RNA catalyses the reaction and separated the amino acid from the aminoacyl- tRNA complex.  

 In the next phase called elongation, peptide bonds are formed between the adjacent amino acid using the enzyme peptidyl transferase and completes the entire amino acid chain formation. 

 As we discussed above the process terminated once the tRNA recognises one of the three termination codons presents at the end of the mRNA. 

In the final step, the release factors or release factor proteins bind to the mRNA; and tRNA, ribosomes and other factors are removed from the complex. 

A mature polypeptide chain of the amino acid is formed and released from the ribosome complex.

Our required protein is yet not formed! Why? 

Because a mature polypeptide chain of the amino acid is actually not a protein. 

A protein is a complex macromolecule made up of single or multiple polypeptide chains, some made up of single amino acid chain while some by many.  

In addition to this, by folding in secondary, tertiary and quaternary structures it forms even more complex form of protein.  

Enzymes, hormones, receptors, chaperons, immunoglobulins, antibodies, cell surface protein, membrane proteins, bone and hair proteins are some of the different types of protein.  

Enzymes are a class of protein widely catalyse many reactions such as lipid metabolism and carbohydrate metabolism. 

If a single enzyme or protein is not formed properly, it can not function at a target location or can not catalyse the biological reaction. 

Related article: DNA story: The structure and function of DNA.

DNA to protein formation in prokaryotes: 

Despite being in the fundamental process of life- gene expression, minor differences between prokaryotic and eukaryotic gene expression makes it unique and adaptive as per their requirement. 

For example, instead of migrating mRNA to the cytoplasm, the ribosomes move towards the mRNA in prokaryotes. 

Also, the untranslated regions of the prokaryotic mRNA are shorter and called shine Dalgarno sequences made up of the AGG AGG box. 

Contrary to the prokaryotic simultaneous gene expression process, the process of transcription and translation is discontinuous in eukaryotes.

In addition to this, the mRNA formed in the prokaryotic transcription is less stable, present only for a few seconds. 

In a day, at every second different protein molecules are formed in our cells. 

Conclusion:

The DNA to protein formation process is discontinuous in us, however, it is not random. The entire process of gene expression is highly regulated and happens at every second. 

Furthermore, the amount of gene expression can be measured using the real-time PCR quantification.  

Resources: 

Alberts B, Johnson A, Lewis J, et al. Molecular Biology of the Cell. 4th edition. New York: Garland Science; 2002. Chapter 6, How Cells Read the Genome: From DNA to Protein. Available from: https://www.ncbi.nlm.nih.gov/books/NBK21050/.