GraphNovo: Revolutionizing Cancer Treatment With Machine Learning

Drug Development AI Data Art Concept

The University of Waterloo’s GraphNovo, utilizing machine learning, significantly advances the accuracy of peptide sequencing in cells, offering breakthroughs in personalized cancer treatment and vaccine development. Credit:

The breakthrough in AI could result in the development of highly personalized medicine for treating serious diseases.

Machine learning technology is aiding scientists in examining the composition of unknown cells, potentially leading to personalized medicine for cancer and other serious diseases. 

Researchers at the University of Waterloo developed GraphNovo, a new program that provides a more accurate understanding of the peptide sequences in cells. Peptides are chains of <span class="glossaryLink" aria-describedby="tt" data-cmtooltip="

amino acids
&lt;div class=&quot;cell text-container large-6 small-order-0 large-order-1&quot;&gt;
&lt;div class=&quot;text-wrapper&quot;&gt;&lt;br /&gt;Amino acids are a set of organic compounds used to build proteins. There are about 500 naturally occurring known amino acids, though only 20 appear in the genetic code. Proteins consist of one or more chains of amino acids called polypeptides. The sequence of the amino acid chain causes the polypeptide to fold into a shape that is biologically active. The amino acid sequences of proteins are encoded in the genes. Nine proteinogenic amino acids are called &quot;essential&quot; for humans because they cannot be produced from other compounds by the human body and so must be taken in as food.&lt;br /&gt;&lt;/div&gt;

” data-gt-translate-attributes=”[{"attribute":"data-cmtooltip", "format":"html"}]” tabindex=”0″ role=”link”>amino acids within cells and are building blocks as important and unique as <span class="glossaryLink" aria-describedby="tt" data-cmtooltip="

DNA, or deoxyribonucleic acid, is a molecule composed of two long strands of nucleotides that coil around each other to form a double helix. It is the hereditary material in humans and almost all other organisms that carries genetic instructions for development, functioning, growth, and reproduction. Nearly every cell in a person’s body has the same DNA. Most DNA is located in the cell nucleus (where it is called nuclear DNA), but a small amount of DNA can also be found in the mitochondria (where it is called mitochondrial DNA or mtDNA).

” data-gt-translate-attributes=”[{"attribute":"data-cmtooltip", "format":"html"}]” tabindex=”0″ role=”link”>DNA or <span class="glossaryLink" aria-describedby="tt" data-cmtooltip="

Ribonucleic acid (RNA) is a polymeric molecule similar to DNA that is essential in various biological roles in coding, decoding, regulation and expression of genes. Both are nucleic acids, but unlike DNA, RNA is single-stranded. An RNA strand has a backbone made of alternating sugar (ribose) and phosphate groups. Attached to each sugar is one of four bases—adenine (A), uracil (U), cytosine (C), or guanine (G). Different types of RNA exist in the cell: messenger RNA (mRNA), ribosomal RNA (rRNA), and transfer RNA (tRNA).

” data-gt-translate-attributes=”[{"attribute":"data-cmtooltip", "format":"html"}]” tabindex=”0″ role=”link”>RNA. 

Immunotherapy and Peptide Sequencing

In a healthy person, the immune system can correctly identify the peptides of irregular or foreign cells, such as cancer cells or harmful bacteria, and then target those cells for destruction. For people whose immune systems are struggling, the promising field of immunotherapy is working to retrain their immune systems to identify these dangerous invaders. 

“What scientists want to do is sequence those peptides between the normal tissue and the cancerous tissue to recognize the differences,” said Zeping Mao, a Ph.D. candidate in the Cheriton School of Computer Science who developed GraphNovo under the guidance of Dr. Ming Li. 

This sequencing process is particularly difficult for novel illnesses or cancer cells, which may not have been analyzed before. While scientists can draw on an existing peptide database when analyzing diseases or organisms that have previously been studied, each person’s cancer and immune system are unique. 

To quickly build a profile of the peptides in an unfamiliar cell, scientists have been using a method called de novo peptide sequencing, which uses mass spectrometry to rapidly analyze a new sample. This process may leave some peptides incomplete or entirely missing from the sequence. 

GraphNovo: A Leap in Sequencing Accuracy

Utilizing <span class="glossaryLink" aria-describedby="tt" data-cmtooltip="

machine learning
Machine learning is a subset of artificial intelligence (AI) that deals with the development of algorithms and statistical models that enable computers to learn from data and make predictions or decisions without being explicitly programmed to do so. Machine learning is used to identify patterns in data, classify data into different categories, or make predictions about future events. It can be categorized into three main types of learning: supervised, unsupervised and reinforcement learning.

” data-gt-translate-attributes=”[{"attribute":"data-cmtooltip", "format":"html"}]” tabindex=”0″ role=”link”>machine learning, GraphNovo significantly enhances the <span class="glossaryLink" aria-describedby="tt" data-cmtooltip="

How close the measured value conforms to the correct value.

” data-gt-translate-attributes=”[{"attribute":"data-cmtooltip", "format":"html"}]” tabindex=”0″ role=”link”>accuracy in identifying peptide sequences by filling these gaps with the precise mass of the peptide sequence. Such a leap in accuracy will likely be immensely beneficial in a variety of medical areas, especially in the treatment of cancer and the creation of vaccines for ailments such as Ebola and <span class="glossaryLink" aria-describedby="tt" data-cmtooltip="

First identified in 2019 in Wuhan, China, COVID-19, or Coronavirus disease 2019, (which was originally called &quot;2019 novel coronavirus&quot; or 2019-nCoV) is an infectious disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). It has spread globally, resulting in the 2019–22 coronavirus pandemic.

” data-gt-translate-attributes=”[{"attribute":"data-cmtooltip", "format":"html"}]” tabindex=”0″ role=”link”>COVID-19. The researchers achieved this breakthrough due to Waterloo’s commitment to advances in the interface between technology and health.

“If we don’t have an algorithm that’s good enough, we cannot build the treatments,” Mao said. “Right now, this is all theoretical. But soon, we will be able to use it in the real world.” 

Reference: “Mitigating the missing-fragmentation problem in de novo peptide sequencing with a two-stage graph-based deep learning model” by Zeping Mao, Ruixue Zhang, Lei Xin and Ming Li, 19 October 2023, Nature Machine Intelligence.
DOI: 10.1038/s42256-023-00738-x