“This experiment illustrates whether we can use AI tools like ChatGPT for Sanger sequencing files and data analysis or not.”
AI is making everyday life better. Using voice commands or camera controls, we can request any information. Studies have shown that it can also perform analysis.
For instance, it helps in solving math problems and performing statistical and data analysis. It can read text, analyze voice, pictures, notes, videos, and any information and provide real assistance.
But does it help study genetic data? Does it analyze a PCR gel, qPCR graph or Sanger sequencing file?
Let’s find out.
This is the first article in our series on using ChatGPT for genetic analysis and today we will know if it can help investigate sequencing results or not. The results of my experiment are shocking.
I am sharing the complete process, including the prompts I used with screenshots and the results I received.
Stay tuned.
Key Topics:
GhatGPT for Sequencing data analysis:
For this example, we will consider only the Sanger sequencing data, because genomics data are complex and huge in size. By the way, if you are new to sequencing, you can check out our previous article on a Beginner’s guide to Sanger sequencing results.
So a typical Sanger sequencing file has an extension like .ab1 or .scf. I asked ChatGPT if it can study this kind of file or not.
Now, carefully analyze its response. It can analyze .ab1 or .scf files, but not directly. We can understand that because its learning model isn’t made for such analysis. Alongside, it has also given information on what it can do and what it can’t with the file.
As per ChatGPT, it provides information on base calls, peak positions, raw signal trace and quality score. It can also generate a FASTA format, assess base-by-base quality and provide visual guidance. Keep in mind, it can’t open the native file or provide data on electropherogram peaks.
Meaning, it can’t generate peaks as per the raw file, which tools like 4Peaks can do. It doesn’t even mention whether it can perform variant analysis or not.
Anyway, let’s move further and upload the sequencing file.
After uploading the file,
Our first conclusion is that ChatGPT can’t analyze the .scf file directly. That’s obvious and I knew that, certainly.
We looked for a way, there is something that we can do to command ChatGPT for such analysis. After doing hours of research, I had an idea.
ChatGPT can analyze images— it can scan any image, identify what’s inside the image and suggest what to do with the information.
Bingo! Let’s give it a screenshot of a Sanger sequencing chromatogram.
So I gave a prompt and a screenshot of a part of a chromatogram to ChatGPT.
The results are here.
I have summarised the results in the table below.
Prompt | Information | Correct or wrong |
Asked for the file name | “Good chromatogram” | Correct |
Base color code | Color for A, T, G, C | Correct (Generic information) |
Observed sequence | Given the sequence in the form of FASTA | Wrong |
Quality assessment | Assessed the base quality | Partially correct (no quantitative data) |
Other information | No background noise, peak spacing is even, no double peaks | Correct |
In conclusion, this information is generic. The crucial part, to identify the actual sequence, is not correct. The sequence alignment results showed that out of 56 nucleotides, only 26 are matching, roughly 46%.
Not a good score!
Note that there are 8 different ways a FASTA file can be used in genetics. Meaning, it has a crucial importance. ChatGPT has given us a wrong FASTA sequence. The sequencing alignment between the actual and ChatGPT generated FASTA sequence is shown below.
So in conclusion, ChatGPT cannot analyse even the simplest Sanger sequencing data. Its image analysis model is very amazing, still, it can’t understand this Sanger Sequencing image and is unable to provide the correct data.
Anyway, let’s move further.
Next, ChatGPT suggested, it can perform BLAST. We asked to perform it. Instead of providing the Basic Local Alignment, it has given a step-by-step guide on how to perform BLAST.
Finally, we asked to determine the quality of each base and give us quantitative data, but it also failed to do so. Instead, it returned a table with only a generic assessment like ‘good,’ ‘moderate,’ or ‘excellent’ peaks.
That’s it!
We concluded several goods and bads from the present experiment.
Limitations and Challenges:
- ChatGPT can’t provide direct help in scientific analysis. That’s our first and prominent finding.
- ChatGPT is not designed to open or interpret raw sequencing files or electropherogram data directly.
- When given screenshots of Sanger sequencing chromatograms, ChatGPT’s base sequence extraction was largely inaccurate (~46% correct), which is too low for practical use.
- The quality assessment and other chromatogram information provided were mostly generic and lacked detailed quantitative precision.
- ChatGPT does not perform variant detection or analyze peak data as specialized tools (e.g., 4Peaks) do.
- Instead of performing BLAST alignment itself, ChatGPT only provided a guide on how to run BLAST, not the results.
- The AI failed to deliver detailed numeric quality scores, offering only vague qualitative descriptions like “good” or “moderate.”
Positives (Strengths and Potential):
- ChatGPT effectively explained base calling, color codes, and general chromatogram characteristics accurately.
- It guided users step-by-step on how to perform tasks such as BLAST, which is useful for beginners.
- While it could not extract accurate sequencing data from images, it could interpret some general aspects, such as color codes and chromatogram quality descriptors.
- The AI can serve as a teaching aid to understand sequencing data types and analysis methods.
Wrapping up:
In conclusion, the future-ready AI is still not ready to assist in scientific analysis, as per our experiment. However, its ability to perform ‘some’ tasks, providing information in a step-by-step manner and problem-solving capabilities are impressive.
This study concludes that AI tools like ChatGPT are at a very primary or basic level to understand and study scientific data. Scientists have to focus more on implementing such models in scientific and medical explorations.
Note that we only tested ChatGPT and not other AI models. The results may differ. If you like this article, share it and subscribe to Genetic Education.