Separating the Wheat from the Chaff: Prediction-Assisted Rescoring of Peptidic Fragment Ion Spectra
Posters | 2020 | Thermo Fisher Scientific | ASMSInstrumentation
Accurate peptide identification in proteomics is critical for biomarker discovery, drug development and understanding biological processes. Traditional database search engines rely primarily on matching fragment masses, often neglecting the intensity information of MS/MS spectra. Integrating intensity-based predictions via deep learning can enhance the confidence of peptide-spectrum matches (PSMs), reduce false identifications and push the limits of complex applications such as immunopeptidomics and metaproteomics.
The main goal of this work was to evaluate a novel rescoring algorithm that incorporates predicted fragment ion intensities into existing search workflows. Using a beta version of Thermo Scientific™ Proteome Discoverer™ 2.5 with SequestHT and a Prosit-derived rescoring node by MSAID, the study aimed to measure improvements in identification rates across diverse datasets: a HeLa digest, an immunopeptidomics (HLA) set and a human stool metaproteome.
The workflow combined:
No new instrumentation was required beyond standard high-resolution tandem mass spectrometers. Data sources included Thermo Scientific™ Pierce™ HeLa digest, ProteomeXchange immunopeptidomics and metaproteomics raw files.
Rescoring with predicted intensities consistently improved identification across all levels:
Additionally, the rescoring approach allowed usage of stricter false discovery rate (FDR) thresholds (e.g. 0.1%) with minimal impact on identification counts, thereby increasing overall confidence. Receiver operating characteristic-like comparisons of target versus decoy scores demonstrated markedly improved separation after rescoring.
Incorporating intensity-based features into PSM scoring offers:
Developments to watch include:
Deep-learning-driven rescoring of fragment ion spectra significantly enhances peptide and protein identification rates and confidence levels across diverse proteomic applications. By leveraging predicted intensities, this approach addresses inherent limitations of mass-based scoring, enabling more stringent FDR control and unlocking new potential in challenging workflows.
1. Gessulat S, Schmidt T, et al. Prosit: proteome-wide prediction of peptide tandem mass spectra by deep learning. Nat Methods. 2019;16(6):509-518.
2. Käll L, et al. Semi-supervised learning for peptide identification from shotgun proteomics datasets. Nat Methods. 2007;4:923-925.
3. Chong C, et al. Integrated proteogenomic deep sequencing and analytics accurately identify non-canonical peptides in tumor immunopeptidomes. Nat Commun. 2020;11(1):1293.
4. Rechenberger J, et al. Challenges in Clinical Metaproteomics Highlighted by the Analysis of Acute Leukemia Patients with Gut Colonization by Multidrug-Resistant Enterobacteriaceae. Proteomes. 2019;7(1).
5. Li J, et al. An integrated catalog of reference genes in the human gut microbiome. Nat Biotechnol. 2014;32:834-844.
Software
IndustriesProteomics
ManufacturerThermo Fisher Scientific
Summary
Significance of the Topic
Accurate peptide identification in proteomics is critical for biomarker discovery, drug development and understanding biological processes. Traditional database search engines rely primarily on matching fragment masses, often neglecting the intensity information of MS/MS spectra. Integrating intensity-based predictions via deep learning can enhance the confidence of peptide-spectrum matches (PSMs), reduce false identifications and push the limits of complex applications such as immunopeptidomics and metaproteomics.
Study Objectives and Overview
The main goal of this work was to evaluate a novel rescoring algorithm that incorporates predicted fragment ion intensities into existing search workflows. Using a beta version of Thermo Scientific™ Proteome Discoverer™ 2.5 with SequestHT and a Prosit-derived rescoring node by MSAID, the study aimed to measure improvements in identification rates across diverse datasets: a HeLa digest, an immunopeptidomics (HLA) set and a human stool metaproteome.
Methodology and Instruments Used
The workflow combined:
- Proteome Discoverer 2.5 software with SequestHT for initial PSM proposals.
- A Prosit-based rescoring node predicting fragment intensities at peptide-specific collision energies.
- Percolator for semi-supervised machine learning classification using both conventional and intensity-based scoring features.
No new instrumentation was required beyond standard high-resolution tandem mass spectrometers. Data sources included Thermo Scientific™ Pierce™ HeLa digest, ProteomeXchange immunopeptidomics and metaproteomics raw files.
Main Results and Discussion
Rescoring with predicted intensities consistently improved identification across all levels:
- HeLa dataset: +10% PSMs, +8% peptides, +4% proteins.
- Metaproteomics (stool microbiome): +13% PSMs, +11% peptides, +10% proteins.
- Immunopeptidomics (HLA Class I): +59% PSMs, +55% peptides, +34% proteins.
Additionally, the rescoring approach allowed usage of stricter false discovery rate (FDR) thresholds (e.g. 0.1%) with minimal impact on identification counts, thereby increasing overall confidence. Receiver operating characteristic-like comparisons of target versus decoy scores demonstrated markedly improved separation after rescoring.
Benefits and Practical Applications
Incorporating intensity-based features into PSM scoring offers:
- Enhanced depth of proteome coverage by recovering spectra previously deemed low-quality.
- Greater robustness in high-complexity samples such as host-microbiome mixtures and immunopeptidomes.
- The ability to adopt more stringent statistical thresholds without sacrificing sensitivity, crucial for clinical or regulatory environments.
Future Trends and Opportunities
Developments to watch include:
- Real-time prediction and rescoring during data acquisition for adaptive instrument control.
- Extension of intensity-based rescoring to non-tryptic and cross-linked peptide workflows.
- Integration with multi-omic platforms and single-cell proteomics to maximize data yield.
- Community-driven models that refine predictions for post-translational modifications and novel fragmentation chemistries.
Conclusion
Deep-learning-driven rescoring of fragment ion spectra significantly enhances peptide and protein identification rates and confidence levels across diverse proteomic applications. By leveraging predicted intensities, this approach addresses inherent limitations of mass-based scoring, enabling more stringent FDR control and unlocking new potential in challenging workflows.
Reference
1. Gessulat S, Schmidt T, et al. Prosit: proteome-wide prediction of peptide tandem mass spectra by deep learning. Nat Methods. 2019;16(6):509-518.
2. Käll L, et al. Semi-supervised learning for peptide identification from shotgun proteomics datasets. Nat Methods. 2007;4:923-925.
3. Chong C, et al. Integrated proteogenomic deep sequencing and analytics accurately identify non-canonical peptides in tumor immunopeptidomes. Nat Commun. 2020;11(1):1293.
4. Rechenberger J, et al. Challenges in Clinical Metaproteomics Highlighted by the Analysis of Acute Leukemia Patients with Gut Colonization by Multidrug-Resistant Enterobacteriaceae. Proteomes. 2019;7(1).
5. Li J, et al. An integrated catalog of reference genes in the human gut microbiome. Nat Biotechnol. 2014;32:834-844.
Content was automatically generated from an orignal PDF document using AI and may contain inaccuracies.
Similar PDF
Thermo Scientific Proteome Discoverer software
2022|Thermo Fisher Scientific|Brochures and specifications
The intelligent protein informatics platform Thermo Scientific Proteome Discoverer software Transform proteomics mass spectrometry data into insights Thermo Scientific™ Proteome Discoverer™ software enables comprehensive proteomics data processing workflows empowered by artificial intelligence. • Powerful and flexible framework: Optimized analysis for…
Key words
inferys, inferysrescoring, rescoringdiscoverer, discovererproteome, proteomepsms, psmschimerys, chimerysworkflows, workflowspeptides, peptideslfq, lfqsearch, searchpeptide, peptidetmt, tmtsequest, sequestproteomics, proteomicsconsensus
Proteome Discoverer 3.0 software with the CHIMERYS intelligent search algorithm
2022|Thermo Fisher Scientific|Others
Mass spectrometry Proteome Discoverer 3.0 software with the CHIMERYS intelligent search algorithm comparison to previous strategies, CHIMERYS finds more PSMs Current generation proteomics data analysis tools are unable to per tandem mass spectrum and markedly improves the spectrometers. Tandem mass…
Key words
min, minchimerys, chimeryspsms, psmssequest, sequestrescoring, rescoringsearch, searchspectrum, spectrumintelligent, intelligenthla, hlasvm, svminferys, inferyspercolator, percolatorpeptides, peptideschimeric, chimericunique
CHIMERYS: An AI-Driven Leap Forward in Peptide Identification 
2021|Thermo Fisher Scientific|Posters
CHIMERYS: An AI-Driven Leap Forward in Peptide Identification Martin Frejno1; Daniel P Zolg1; Tobias Schmidt1; Siegfried Gessulat1; Michael Graber1; Florian Seefried1; Magnus Rathke-Kuhnert1; Samia Ben Fredj1; Shyamnath Premnadh1; Patroklos Samaras1, Kai Fritzemeier2; Frank Berg2; Waqas Nasir2; David Horn3; Bernard Delanghe2;…
Key words
chimerys, chimerysidentified, identifiedpsms, psmssequest, sequestfdr, fdrpeptides, peptidesentrapment, entrapmentchimeric, chimericsearch, searchprecdet, precdetspectra, spectraspectrum, spectrumproteins, proteinspercolator, percolatorhela
The Good, the Bad and the Ugly: when statistics tells you to throw away peptide IDs
2020|Thermo Fisher Scientific|Posters
The Good, the Bad and the Ugly: when statistics tells you to throw away peptide IDs. Siegfried Gessulat1, Tobias Schmidt2, Michael Graber1, Florian Seefried1, Dave Horn3, Christoph Henrich4, Bernard Delanghe4, Daniel Zolg2, Mathias Wilhelm2, Bernhard Kuster2, Martin Frejno1 1msAId GmbH,…
Key words
fasta, fastadecoys, decoysthreshold, thresholddecoy, decoydata, datareversing, reversingfdr, fdreasy, easybad, baddiscoverer, discovererdivided, dividedthousands, thousandsproteome, proteomeidentifications, identificationsscore