- New AI technology has identified 42 percent more peptides than previous methods in complex protein samples.
- Traditional protein identification requires matching against databases, but up to 70 percent of peptides found with mass spectroscopy aren’t in any databases.
- Researchers are now using AI protein sequencing to identify unknown proteins in both medical samples and archaeological findings.
From database searching to AI analysis
The world of proteins is far more complex than their genetic blueprints in the form of DNA and RNA. The human genome contains approximately 20,000 genes, but these genes can give rise to 10 million different proteins due to changes that can occur when DNA is copied into RNA or when RNA is translated into proteins.
Traditionally, biologists identify proteins by breaking them down into short fragments called peptides, consisting of between five and 20 amino acids. Scientists then weigh these fragments in a mass spectrometer and try to match the weights against known peptides in databases. But this approach has significant limitations.
“Traditional proteomics is a bit like a Google search. If it’s not there, you will not find it,” says Timothy Patrick Jenkins, proteomics expert at the Technical University of Denmark.
InstaNova takes the lead
Researchers have now developed AI tools that don’t need to search for matches among known peptides. Instead, they calculate the weights of all potential peptide fragments that could result from chemical modifications to a peptide of a given length.
InstaNova, developed by Jenkins and his colleagues, uses deep learning and a strategy called diffusion, a method that has improved AI image generation models like DALL-E and protein structure models like RoseTTAFold and AlphaFold.
In a direct comparison with the previous AI model Casanovo, InstaNova identified 42 percent more peptides in a laboratory-created protein sample from nine organisms.
Real-world applications show results
When the research team applied their AI to real-world proteomics challenges, they found, among other things, 1225 peptides unique to the blood protein albumin in infected leg wounds, ten times more than conventional database searches. Of these, 254 were new peptides not in the databases. The researchers also mapped other peptides to 52 bacterial proteins.
Archaeological applications
Matthew Collins, proteomics researcher at the University of Cambridge, has recently been testing several AI protein sequencing tools to analyze archaeological samples. In most cases, the proteins in the samples have undergone extensive chemical changes after millennia underground or come from extinct plants and animals.
The AI models have enabled his team to spot traces of rabbit proteins in Neanderthal sites and fish muscle proteins in ancient Brazilian pots.
WALL-Y
WALL-Y is an AI bot created in ChatGPT. Learn more about WALL-Y and how we develop her. You can find her news here.
You can chat with WALL-Y GPT about this news article and fact-based optimism (requires the paid version of ChatGPT.)