What Is a Gene?- Definition, Structure And Function

What is a Gene?- Definition, Structure and Function

“A gene is a functional- hereditary unit made up of nucleotides which forms proteins.” Genes are located on chromosomes.” 

Or we can say, 

“A gene is an inheritance unit of a cell”- is a globally and universally accepted definition of a gene. But actually, it’s not a complete one or we can say scientifically it isn’t accepted fully. 

Defining a “gene” is actually intricate. Indeed it is difficult to understand for even some bright students too. Terms like gene, alleles, DNA and locus are so confusing!

It takes a whole semester to teach students genes and related stuff”, said Rick Young, a geneticist from Whitehead Institute- Cambridge, Massachusetts. 

So it is truly tough to learn what actually a gene is! 

Since the time of Mendel, genes are known to us, although, their chemical structure was not well studied until 1953. So we can say, periodically, its definition has changed. 

Yet another definition of a gene that is more satisfactory states, “genes encode structural and functional proteins.” It is a reliable and decent explanation but here even, partially!

As per the recent discoveries in genetics not all genes encode proteins. Even some are unknown to us. Their functions are yet not known.

Scientists also believe that some genes control behavioral factors. Albeit, supportive evidence for this fact is very little. This indicates that a gene itself is an entire study subject. As we know more about a gene, it opens new challenges to learn it.

In the present article, we will try to understand “what a gene is?” and its function. Further, we will discuss its nomenclature, function and some of the important genes.

Interesting article: What is a Sonic Hedgehog (SHH) gene?

What is a gene? 

Approximately 22,000 to 25,000 genes are present in the human genome. Some of them are active and some are inactive. Although, the number of genes in different organisms varies. Take a quick look at the number of genes in different organisms.

Different number of genes in different organisms.
The number of genes in different organisms.

Note that the size of the organism is not at all related to the number of genes (you might think that some organism is very tiny hence it may have fewer genes). 

Functionally, some genes can encode proteins while some can’t! Those genes which can’t manufacture a specific protein helps in the “regulation of gene expression.”

It’s not of a fixed size. Some genes are larger while some are very small. For instance, the gene for DMD– contains more than 22 exons while the HBB gene is smaller. 

The DMD gene is located on chromosome X and contains ~2.3Mb portion while the HBB gene is located on chromosome 11 and comprises only 1600bp. Hence this clarifies one fact that gene function is independent of its length.   

The term gene was coined by Wilhelm Johannsen in 1909. However, their function was reported by Mendel during the 1800s.

Notably, the chemical structure of genes is almost similar in prokaryotes as well as eukaryotes although their location and regulation differ.  


“A functional segment of DNA which manufacture protein, regulate gene expression and renowned as a hereditary unit is known as a gene.”

Read also: Definition of genetics and related terminologies.


Mendel first discovered the concept of the “inheritance of traits”, despite, he fails to describe it. 

The term “gene” was coined and studied by Wilhelm Johannson. But he was unable to describe the chemical structure of it. In 1953, James Watson and F Crick defined the chemical structure of the DNA viz gene. 

Structure of gene: 

Genes are actually DNA strands thus are made up of the nucleotide chain. The chemical structure of a gene comprises nucleotides.

A part of DNA- genes are made up of A, T, G and C nucleotides. With the nucleotides of the opposite strand, it binds with hydrogen bonds and with the adjacent nucleotide, it binds with phosphodiester bonds. 

The nucleotides are the combination of nitrogenous bases (A, T, G and C), phosphate and pentose sugar. 

Key information– only ~2.2% of human DNA are coding sequences. 

In general, the gene structure consists of two types of elements: core elements and regulatory elements

The core elements or sequences actually take parts in protein formation. While the regulatory elements maintain gene expression. 

Exons are core elements. Sequences on the other side like promoters, enhancers and silencers are regulatory elements of a gene. 

The third type of element called maintenance elements possesses information for DNA repair, modification and replication. The functional or physical structure of a gene comprises introns, exons, promotes, enhancers and UTRs. 

Related article: Introduction to Exon and intron.  

Introns are intervening non-coding sequences removed from the final transcript. 

Exons are coding part of a gene which are joined after splicing and constructs the final transcript. 

Regulatory elements are located on the extreme ends of a gene. 

The molecular structure of a gene.
The molecular structure of a gene.

Promotes are non-coding sequences but facilitates binding sites for enzymes and transcriptional factors to work. The promoter consists of TATA box and CCAAT sequences for enzyme binding. 

The entire promoter region is located on the 5’ end and made up of core promoter and proximal promoter sequences (see the above image). 

Here, the core promoter facilitates RNA polymerase bindings (and other proteins) to start transcription. While the proximal promoter provides bindings for transcriptional factors. 

The enhancer induces transcription while the silencer represses it. Collectively, enhancers and silencers located far away from the exon, regulate gene expression. 

The 3’ untranslated regions are non-coding regions of gene help in aborting the process of transcription and to form the final transcript. 

Once the RNA polymerase reaches the untranslated region it stops synthesizing RNA and detached from the strand. 

Key information

life originated from a common ancestor thus the chemical structure of genes is almost similar in prokaryotes and eukaryotes. However, the regulatory sequence elements, transcription and translocation machinery differ.  

The eukaryotic gene structure consists of more regulatory sequences than prokaryotic genes. In addition to this, the entire machinery of transcription and translation is different in both. 

The operon concept of prokaryotic genes consists of a gene cluster of similar functions. Introns are not a part of an operon. 

Contrary, the eukaryotic genes consist of introns (non-coding DNA) at regular intervals. Each and every gene have their own promoter region to facilitate transcription. 

Also read- prokaryotic DNA vs eukaryotic DNA.

All the non-coding elements that help in gene regulation are divided into two categories viz cis-acting elements and trans-acting elements. 

Promoters, enhancers, silencers, activators, insulators, locus control regions and MARs- matrix attachment regions are categorized into cis-elements. 

While other transcriptional proteins which are formed from some genes are categorized into trans-elements. The in-depth structure of a gene with all elements is shown in the figure above.

Functions of gene: 

The main function of a gene is to form or manufacture a protein, however, it’s not the only function. Indeed It’s partially true. 

Some genes can’t form protein, although they transcribe into mRNA. For instance, the microRNAs are the type of tiny ribonucleic acid formed from some genes but it doesn’t undergo protein formation. It helps in gene regulation instead. 

Now first let’s understand how genes form proteins?

From replication, the DNA or genes copied from one cell to two daughter cells. The process of replication is regulated by DNA polymerase. 

After that, the definite region of DNA (a gene) undergoes transcription through RNA polymerase. 

As we said, RNA polymerase binds near the promoter region and starts adding nucleotides. The mRNA is constructed from a gene. 

After that, post-transcriptional modifications happen, followed by the migration of mRNA to the cytoplasm. 

At the ribosome, in the cytoplasm, the mRNA translated into the chain of amino acids. That is how the entire mechanism of protein formation occurs in a cell. 

Here as we are discussing a “gene”, it is very important to mention how different genes work. Three types of genes are present in our genome,

Genes that encode a single protein— some genes only encode one particular protein, It has massages only to form a single type of protein product, for example, the hemoglobin gene HBA and HBB

The HBA gene encodes the alpha chain of Hb while the HBB gene encodes the beta-globin chain of the Hb protein. 

Genes that encode many proteins– In conventional genetics, scientists were believed that a single protein formed from one single gene. Nonetheless, the assumption was totally wrong. 

A gene, with many different exons, can create more than one kind of protein product. With the combination of different exonic sequences, various types of amino acid chains are constructed. And that is the beauty of it. 

You can’t believe this!

“A single fruit fly gene can encode 38,000 types of different proteins.” 

Non-protein coding genes– some genes can’t form proteins instead of that they act in gene regulation.

Besides this, several other sets of genes based on their function are categorized here: 

Housekeeping genes: Genes required to perform normal functions of every cell are known as housekeeping genes. Usually, These genes code protein products for transcription, translation and replication. 

Inducible genes: Normally inducible genes remain inactive, expressed under the influence of extrinsic factors. 

Developmental genes: These types of genes help in the early stage of the growth and development of organisms. 

Tissue-specific genes: Unlike the housekeeping genes, tissue-specific genes express only in some type of tissue. And it remains inactive in other types of tissues. 

Homologous genes: genes inherited from a common ancestor, share a common function and have sequence similarities are categorized into homologous genes. 

Non-homologous genes: Genes that aren’t inherited from a common ancestor instead, it is originated due to some evolutionary forces are known as non-homologous genes. 

Autosomal genes: genes located on autosomal chromosomes are categorized into autosomal genes 

Sex-linked genes: Genes located on the X or Y chromosome (in humans) or on sex chromosomes are classified in this category. These genes are very crucial for the reproductive health of a person. 

Apart from this “not-so-common” function of genes. Genes interact with environmental factors and create new alleles. New alleles give rise to new traits for an organism to survive. 

Also, it inherited from parents to their offspring, hence, it also transports vital information for a cell or organism.  

After all the main function of genes or DNA is to make us survive on earth in any condition. 

Unfortunately, some unusual changes or alteration in a gene also originates disease which we will discuss in the upcoming part of this article. 

Related article: What is a genome?

Gene nomenclature:

The guidelines for human gene nomenclature were first adopted in 1979. After that HGNC had provided updates 3 to 4 times. We are now using the 1997 updated version of the HGNC. 

The entire process of gene nomenclature is actually complicated. Here I am explaining to you the major points and some important elements for naming genes. 

Key points–

Uppercase, Latin letters and Arabic numerals are used for naming genes. Genes are short forms of the whole name. The first character must be in uppercase followed by numbers or Arabic letters. 

  • All characters must be written in a single lane; no sub- or superscript is allowed. 
  • Greek Letters and roman numbers are not used in a symbol. 
  • All the letters of the gene symbol should be in uppercase. 
  •  Chromosome name, species reference, other gene symbol and punctuation are not allowed. 

Note– in some cases chromosome numbers are considered.  

Let’s look at some of the examples of gene symbol:

  1. ACOT1– acyl-CoA thioesterase 1
  2. ABCA1– ATP= binding cassette sub-family A (ABC1), member 1. 
  3. SRY– a sex-determining region on Y
  4. HBA– human beta-globin A

Note– The gene name always starts with the lowercase letters. 

For more detail on gene naming and nomenclature please read the official guidelines of HGNC- Human gene nomenclature committee

Inheritance of genes: 

The basis of inheritance was first described using a “gene” concept. 

Genes can pass on, from one to another generation. That is the beauty of genetic science. With it, different genotypes for the production of various phenotypes can be inherited or passed on. 

Transfer of genes, genotypes or traits from parents to their offspring through asexual or sexual reproduction is referred to as genetic inheritance. 

Alleles are alternative forms of a gene. Two alleles together define one genotype. Genotype controls a particular type of phenotype or trait.  

During sexual reproduction, one allele from the father and one allele from the mother inherited into the fetus. Likewise, the whole haploid set of the genome from each parent constructs their fetus genetics.

With this, some genetic mutations also sometimes inherited and cause genetic abnormalities. 

If it occurs in the autosomal genes, it is known as autosomal gene mutation. The rest are sex-linked or extrachromosomal.  

Read our article on genetic inheritance

Gene mutations: 

Mutations are undesirable changes or alterations in our genes. And as we know it causes some serious health issues. 

Scientifically, “alterations in a nucleotide sequence of a DNA or gene which causes genetic abnormalities are known as mutations.”

Insertion, deletion, translocation, inversion or base substitution are some of the common types of gene mutations that happen in a genome. Some mutations are acquired, only occur after birth through some mutagens, for example, cancer.   

Contrary, some mutations can pass on in offsprings, those mutations might either lethal dominant or recessively normal. These types of mutations are categorized as inherited mutations. 

IVS1-5 is a type of single base mutation of the beta-globin gene that causes beta-thalassemia. It’s an autosomal recessive condition thus two recessive alleles are required to cause anemia. Likewise, some mutations are dominant or extrachromosomal. 

Interestingly, scientists have developed several strategies to correct gene mutations known as gene therapy.  

Gene therapy: 

A faulty or non-functional gene can be replaced by the wild type one using the gene therapy approach.

Scientists use several vehicles known as vectors to transfer genes into live cells. Those are viral or non-vectors. 

AVV, antivirus and adenovirus are viral vectors while liposomes, chemicals and electroporations-like methods are non-viral vectors used to insert a gene. 

Although, the entire process isn’t so easy as we are discussing. Scientists are trying gene therapy for a long and some were successfully achieved too. FDA approved the first fully functional gene therapy in 2017 for sickle cell anemia. 

Designed and prepared by the Bluebird Bio, A boy from France was successfully treated. After that FDA approved so many gene therapies till now. 

Related article: 

Gene regulation: 

“Expression of genes is regulated by genes too.”

As we know, genes are everything for us because it performs so many functions for us. Even some are unknown to us. Besides governing trait inheritance, genes also regulations the expression of other genes in different cells and tissues. 

However, the process is different in prokaryotes and eukaryotes. Let us understand the process of gene expression in brief. 

“Gene expression- a cellular process in which cell itself decide which genes need to ‘turn on’ and which are not.” 

Gene expression in eukaryotes: 

First, I am trying to explain the present topics for those who don’t know anything about it. Then we will discuss it technically. 

As we said some genes are expressed in all cells- the housekeeping genes. But majorities of genes are tissue-specific. This means in some tissues or cells they expressed and in some remain dormant. 

Why is it so?  

Melamine protects our skin from the harmful radiation of the sun. Thus skin cells required a higher amount of melanin. But The same gene is not required in liver cells. Are you getting my point? 

Its function is nothing in the liver or kidney. Therefore some genes make it unexpressive and regulate its production in the body. This entire mechanism is known as the regulation of gene expression. 

Now, this is for pro–

Methylation, histone modification, chromatin remodeling, RNA interference are some of the processes that help in controlling genes. 

Once a methyl group is inserted into a gene, enzymes can’t recognize it. The gene skips transcription and translation and unable to form a protein. Methylation suppresses gene activation. 

Another factor is a class of genes that influence gene activity. Some genes form microRNA or siRNA. However, those smaller RNAs are unable to form a protein.  

But instead, it binds to the mRNA transcript and aborts the transcription process. 

Gene expression in prokaryotes: 

What does a promoter do? We already answer it, it promotes transcription viz protein formation. Each gene has its own promoter in eukaryotes but it’s not the case for our ancestors. 

I am talking about prokaryotes. Genes in them regulated through operon. In an operon, many genes are transcribed from a single promoter. 

Through the operon, different proteins are formed from different genes at once with a single promoter. Although, the entire process is regulated by smaller molecules like activators or repressors. 

The activator binds to the promoter and boosts the activity of RNA polymerase, those operons are known as inducible operons. For example, lac operon

Regulation of gene expression in prokaryotes via inducible and repressible operon.
Regulation of gene expression in prokaryotes via inducible and repressible operon.

The repressor binds instead of the activator and represses the activity of the RNA polymerase. Those operons are known as repressible operons, for example, trp operon

Notably, not all the genes in eukaryotes are expressed through the opens. The regulatory genes are transcribed independently. 

Some common genes: 

Here in the present article, I am enlisting some of the common genes, scientists are interested in studying. 


The MTHFR gene is located on chromosome 1 at 1p36.22. It encodes the enzyme methylenetetrahydrofolate reductase. Its major role is in processing amino acids like homocysteine. 

The MTHFR gene mutations are one of the common types of alteration found in many abnormal conditions like homocystinuria, Alopecia areata, spina bifida and age-related hearing loss. 

It is ranked number 8 in most studied genes with 3,200 literature available on it. 


The TP53 gene is located on chromosome 17 at 17p13.1. It manufactures tumor protein p53. The present gene is a type of tumor suppressor gene which controls the cell division process. Mutation in The TP53 gene causes abnormal cell growth viz cancer. 

It is one of the major causes of many different types of cancers like ovarian cancer, Wilms tumor, lung cancer, Melanoma, Breast cancer and squamous cell carcinomas.

It is on the top of nature’s most studied genes with more than 8,400 literature are available on it. 


The IL6 (interleukin 6) gene is located on chromosome 7 at 7p15.3. It translates cytokine protein which helps in the maturation of B cells and inflammation. Interleukin is the component of the immune system, and protects our body. 

The involvement of the IL gene mutations is in many types of health-related conditions. Scientists are also very much interested in the present gene. Thus it ranks 6 with around 4,000 literature available on it. 


The tumor necrosis factor- TNF gene is located on chromosome number 6 at 6p21.33. It encodes a type of cytokine protein from the superfamily tumor necrosis factor. 

The function of the present protein is in cell differentiation, apoptosis, coagulation and lipid metabolism. 

The TNF gene is the second most studied gene by scientists across the world with more than 5,000 literature on it are now available. 


The EGFR gene constructs a protein known as epidermal growth factor receptor, located on chromosome 7 at 7p11.2. The protein is located on the cell surface and provides a ligand-binding site. 

Mutations in the EGFR gene are involved in lung cancer. With nearly 4,500 online literature, the EGFR gene is the third most studied gene.   


Located on a chromosome 6 and 6q25.1-6q25.2, the ESR gene constructs a ligand-activated transcription factor known as estrogen receptor 1. It helps in transcription. 

Mutation in the ESR gene involved in estrogen resistance disorder. Also, It’s linked to myocardial infraction 1 and familial breast cancer syndrome. 

With more than 2,800 articles on it, it is ranked in the 9th position in most studied genes. 


The CFTR gene is located on chromosome 7 at 7q31.2. The cystic fibrosis transmembrane conductor regulator protein is encoded by the CFTR gene. This works as a channel across the cell membrane for mucus, sweat and saliva secreting cells. 

Mutation in a CFTR gene causes cystic fibrosis, a type of disorder associated with the respiratory system. 


The HBB gene is located on chromosome 11 at 11p15.4. It provides the information for making hemoglobin subunit beta protein. It is a component of the hemoglobin that transfers the oxygen molecule to different parts of the body. 


The APOE gene provides information for encoding apolipoprotein E. It is a major constituent of lipoproteins. These molecules help in packing and transferring fat molecules. 

The present gene is located on chromosome 19 at 19q13.32. Mutations in APOE are associated with many disorders like Alzheimer’s disease, age-related macular degeneration & hearing loss and dementia. 

Out of the 23,000 genes, only a few 100 genes are well studied. Even some genes’ inheritance patterns and their functions are still unknown to us. 


Undoubtedly, genes are very important to us. In fact, for the entire life on earth. The survival skills and evolutionary traits are transferred through the information tailored by genes. 

By interacting with the environmental factors it causes mutations, although it might either helpful or harmful. 

Harmful alterations cause serious, sometimes inherited health problems. Thanks to gene therapy, now we have some successful cases, we can replace the faulty gene with a healthy one. 

Although, tampering with our DNA or gene can raise some serious issues for us. 


Pearson, H. What is a gene?. Nature 441, 398–401 (2006).

Lodish H, Berk A, Zipursky SL, et al. Molecular Cell Biology. 4th edition. New York: W. H. Freeman; 2000. Section 9.1, Molecular Definition of a Gene. Available from: https://www.ncbi.nlm.nih.gov/books/NBK21640/

Subscribe to Us

Subscribe to our weekly newsletter for the latest blogs, articles and updates, and never miss the latest product or an exclusive offer.

Share this article

Scroll to Top