10 Challenges in Whole Genome Sequencing – Genetic Education
Challenges in WGS

10 Challenges in Whole Genome Sequencing

“Genome sequencing is a technique to sequence or read the complete genome. Know about the common challenges scientists face during the WGS.”

Newer advancements in the native Next-generation sequencing platforms allow us to sequence the complete human genome. Chemistries available in the NGS instruments can effectively and accurately sequence the entire genome. 

Short-read sequencing, the main chemistry in various NGS platforms, is a robust and faster fragment sequencing technology. The entire genome can be sequenced in a day. However, sequencing the whole genome is still a challenging task. 

The cost of sequencing, throughput, assembling, data storage and analysis, and scalabilities make whole genome sequencing challenging. In this article, I am explaining 10 common challenges scientists face during whole genome sequencing. 

In addition, in the last segment, I will note down the biggest challenge for whole genome sequencing. 

10 Challenges in Whole Genome Sequencing

The graphical representation of next generation sequencing
Illustration of the Next-generation whole genome sequencing.

1. Studying and annotating complex structural variations

Complex and rare genomic variations from the entire genome are still difficult to study. There are many technological limitations for that but the two most common reasons are short-read sequencing and availability of reference genome data. 

Complex genomic and structural variations pose an additional challenge in the reading process due to their complex and intricate nature. On the other hand, the methods available to study these data also have limitations. 

So, still, with 99% accuracy and >30X sequencing coverage, it’s challenging to study and annotate the complex structural variations such as deletions, duplications, inversions and translocations. 

2. Data storage, management and analysis

Genome sequencing data are huge, complex and heterogeneous. WGS generates gigabytes to terabytes of data. For example, the NovaSeq 6000 machine from Illumina generates 10 TB of data per run.

In addition, the whole file contains thousands to millions of variants, alterations and novel structural polymorphisms. Such huge and complex data are difficult to store, manage, transfer and process. 

For example, during data analysis, transferring the data from the computer system to the software and performing analysis poses additional challenges for WGS, in terms of technology, investment and skilled manpower. 

Sophisticated and state-of-art computational facilities as well as a team of expert bioinformaticians are needed to handle the data. Data storage, management and analysis are significant challenges in genome sequencing. 

Related article: Whole-Genome vs Whole-Exome Sequencing.

3. Cost and scalability

The sequencing cost has dramatically decreased in the last few years, however, it’s still too costly for clinical diagnosis. The complex instrumental chemistry, data storage, management and analysis facilities are the major reasons for higher cost. 

And due to this reason, it’s difficult to scale up the genome sequencing business. Each WGS platform setup requires crore investments. Irrespective of the investment, the sample size is too small. 

The higher cost of the techniques like WGS, WES and transcriptomics studies is still a challenge to scale up. 

4. Read alignment to the reference genome

Read alignment with the reference genome is yet another challenging task in WGS. Present-day NGS platforms highly rely on short-read sequencing, as aforementioned. Fragments of 50 to 300 bp are sequenced in the massively parallel technique. 

Alignment is a difficult process, especially when it comes to repetitive and complex genomic regions. This affects the overall sensitivity and accuracy of the sequencing. 

Although third-party software is now available for accurate read alignment to the reference genome, errors may occur 1 in every 50Kbp. Meaning, approximately 12,000 errors may still occur with the most accurate alignment software. 

Hence, re-sequencing is mandatory to report pathogenic variants which is yet another challenging job. 

5. Variant validation (re-sequencing)

In the case of WGS and WES variant validation is a crucial step. 1,50,000 novel SNPs can be identified during each WGS. Among which pathogenic and non-pathogenic SNPs exist. 

Contrary, as we said in the above point, errors and gaps may occur during sequencing. Thus, it’s a big challenge to know if the SNP or variant identified by the WGS is occurring naturally or is a sequencing error. 

For validation of suspected pathogenic variants, a re-sequencing assay or a Sanger sequencing has been carried out. This increases the overall cost of the WGS.  

Related article: What is Single Nucleotide Polymorphism (SNP)?- The Basics 

6. Limited throughput 

How many whole-genome sequencing samples lab personnel can run in a day? A few. A sophisticated and state-of-the-art NGS platform, available nowadays, can take 24 to 48 hours to sequence a single genome.

Meaning, that the throughput in terms of WGS is still too limited even with the next-generation sequencing power. This will eventually increase the TAT.

Related article: What is High Throughput Sequencing and How Does It Work?

7. Turn-around time

TAT is yet another big challenge for WGS. With the latest NGS instrumentation, data analysis facilities and team of experts, one can report between 5 to 7 days. TAT is still the biggest concern for scientists in recent times. 

Sample preparation, sample throughput, instrumental limitations, data processing and complexity in bioinformatics pipelines contribute to overall Turnaround time for WGS and WES. 

8. Overall accuracy of sequencing

With clinically recommended sequencing depth and coverage, ~12,000 errors, many gaps and inaccurate sequencing areas can still exist in the final file. This will decrease the overall accuracy and sequencing precision.

Again, sample preparation (extraction, fragmentation and library preparation), sequencing process and data processing limitations introduce errors and gaps in the final sequence and pose challenges in analysis and reporting. 

9. Sequencing complex and repetitive genomic regions

Accurate sequencing and annotating the complex and repetitive genomic regions pose the most significant challenge in WGS. Duplications, repeats, transposable elements and satellite regions are difficult to sequence.  

In addition, larger duplications, inversions and translocations are even more difficult to sequence and study. However, long-read sequencing can help troubleshoot this problem, irrespective of having several other challenges.

10. Other technical limitations 

Difficult samples for isolation, fragmentation and library preparation, sequencing capillary or chip preparation, sample handling and storage, etc are other technical challenges that come up with every genome sequencing experiment. 

Frequently, these wet lab procedures result in sequencing experiment failures, resulting in both time and money being wasted.

However, automated nucleic acid extractors, ready-to-use kits for DNA extraction, fragmentation and library preparation, and state-of-the-art sample handling and storage facilities can help to overcome these challenges but are costly.  

Related articles:

Wrapping up

Despite significant advancements in sequencing technologies, genome sequencing remains a complex biological endeavor. To improve accuracy, high throughput, and TAT, we need to improve methodologies, technologies and infrastructure. 

Furthermore, reading every nucleotide from the genome is still costly, limiting the technology’s integration into clinical diagnosis. It is hoped that these challenges will be addressed soon. 

Resources: 

Chrystoja, Caitlin C., and Eleftherios P. Diamandis. “Whole Genome Sequencing as a Diagnostic Test: Challenges and Opportunities.” Clinical Chemistry 60, no. 5 (2014): 724-733. Accessed February 15, 2024. https://doi.org/10.1373/clinchem.2013.209213.

Petersen, BS., Fredrich, B., Hoeppner, M.P. et al. Opportunities and challenges of whole-genome and -exome sequencing. BMC Genet 18, 14 (2017). https://doi.org/10.1186/s12863-017-0479-5.

Subscribe to Us

Subscribe to our weekly newsletter for the latest blogs, articles and updates, and never miss the latest product or an exclusive offer.

Share this article

Scroll to Top