Enhanced identity spectrum search with AI/ML confidence scoring for HRAM data

Posters | 2025 | Thermo Fisher ScientificInstrumentation
Software, LC/Orbitrap, LC/HRMS, LC/MS/MS, LC/MS
Industries
Other
Manufacturer
Thermo Fisher Scientific

Summary

Significance of the topic


Untargeted high-resolution accurate mass (HRAM) analysis of small molecules relies on matching unknown spectra to reference libraries. Traditional scoring methods often produce ambiguous results when isomeric or isobaric species generate similar fragmentation profiles. Developing an AI/ML-based confidence scoring system can resolve these ambiguities and improve identification accuracy in complex analytical workflows.

Objectives and study overview


The primary goal was to create and validate a machine learning confidence score for Orbitrap-acquired MS2 data using the Thermo Scientific mzCloud spectral library. The study aimed to compare its performance against established metrics (Cosine, NIST, HighChem-HighRes) and the legacy Bayesian network-based confidence score in Compound Discoverer.

Methodology and Instrumentation


A histogram gradient boosting model was chosen for its ability to handle missing values and complex feature interactions. Key aspects included:
  • Training dataset: 3.46 million MS2 spectra from 34 000 compounds at CID and HCD energy levels (10–200 NCE).
  • Feature engineering: 170 descriptors capturing spectral sparsity, balanceness, metadata (analyzer type, isolation width, precursor mass error) and traditional match scores.
  • Search simulation: Python-driven API calls against mzCloud Autoprocessed and Reference libraries, using 80 parallel workers on an AWS ml.r6i.32xlarge instance.
  • External validation: Food Safety Mass Spectral Library (1 007 compounds) acquired on Orbitrap IQ-X Tribrid with UHPLC and positive ESI, covering veterinary drugs, contaminants, pesticides and natural toxins.

Instrumentation details:
  • Thermo Scientific Orbitrap IQ-X Tribrid
  • Thermo Scientific Q Exactive
  • Thermo Scientific Fusion

Main Results and Discussion


At the spectrum-pair level, the AI/ML model achieved 89.2 % accuracy and a ROC AUC of 0.95, outperforming Cosine (0.66), HighChem-HighRes (0.68), NIST (0.67) and legacy confidence (0.58). Compound-level validation on mzCloud yielded AUC = 0.99 versus 0.92 for the legacy model, and on the Food Safety library AUC = 0.97. Ranking assessment showed improved top-hit placement for 104 of 400 mzCloud compounds (41 worsened, 255 equal) and for 48 of 560 Food Safety compounds (28 worsened, 484 equal). A case study with diosmetin spectra illustrated clear separation of true and false isomeric hits by the AI/ML score, where traditional scores failed. Shapley value analysis provided interpretable feature contributions, indicating which input parameters increased or decreased match confidence.

Benefits and practical applications


The AI/ML confidence score enhances identification reliability by:
  • Reducing ambiguous multiple hits in library searches.
  • Enabling clear ranking of true candidates in large libraries.
  • Offering explainable outputs to guide method refinement, such as optimizing collision energies.

Future trends and potential applications


Further developments may include continuous model retraining with new spectral data, integration of additional metadata (ion mobility, chromatographic retention), expansion to other vendor libraries, and real-time confidence scoring within data acquisition software. Explainable AI techniques will grow to foster user trust and drive adaptive scan strategies.

Conclusion


The novel AI/ML confidence scoring model significantly surpasses traditional spectral similarity and Bayesian approaches in both accuracy and candidate ranking. Its implementation within the mzCloud platform will provide users with robust, interpretable confidence metrics for HRAM library searches.

Reference


Food Safety Mass Spectral Library from Wageningen University, accessed March 2025, https://www.wur.nl/en/show/food-safety-mass-spectral-library.htm.

Content was automatically generated from an orignal PDF document using AI and may contain inaccuracies.

Downloadable PDF for viewing
 

Similar PDF

Toggle
Identification of Small Molecules via Real-Time Library Search on an Orbitrap Tribrid Mass Spectrometer
Identification of Small Molecules via Real-Time Library Search on an Orbitrap Tribrid Mass Spectrometer William D. Barshop, Jesse D. Canterbury, Brandon J. Bills, Vlad Zabrouskov, Seema Sharma, Thermo Fisher Scientific, 355 River Oaks Parkway, San Jose, California, United States, 95134…
Key words
rtls, rtlslibrary, librarysearch, searchreal, realcosine, cosinespectral, spectralscore, scoremzvault, mzvaultdecision, decisionfilter, filtermzcloud, mzcloudinfrastructure, infrastructurespectra, spectrascoring, scoringscores
Building curated and annotated HRAM MSn spectral libraries to aid in unknown structure elucidation
TECHNICAL NOTE No. 65602 Building curated and annotated HRAM MSn spectral libraries to aid in unknown structure elucidation Authors: Caroline Ding, Kate Comstock, Seema Sharma, Mark Sanders, Michal Raab Thermo Fisher Scientific, San Jose, CA Keywords: Orbitrap ID-X, Mass Frontier,…
Key words
msn, msnsubstructure, substructurelibrary, libraryspectral, spectralstructure, structuremzlogic, mzlogicsearch, searchidentification, identificationcompound, compoundtree, treespectra, spectraalgorithm, algorithmdimethylsidenafil, dimethylsidenafilranking, rankingquery
Small Molecule Real-Time Library Search
Small Molecule Real-Time Library Search
|Thermo Fisher Scientific|Posters
Small Molecule Real-Time Library Search William Barshop, Jesse Canterbury, Brandon Bills, Seema Sharma, Thermo Fisher Scientific, 355 River Oaks Parkway, San Jose, CA, USA, 95134 Abstract Purpose • To guide instrument acquisition decisions by consideration of the similarity of experimentally…
Key words
search, searchlibrary, libraryreal, realcosine, cosinemzvault, mzvaulttime, timetribrid, tribridconfidence, confidencesimilarity, similarityscores, scoresspectral, spectralorbitrap, orbitrapdecisions, decisionsmzcloud, mzcloudlogic
Identifying Food and Environmental Contaminants using the New NIST High-Res MS/MS Library Search Algorithms and Publicly Available LC/MS/MS Spectral Libraries
Poster Reprint ASMS 2020 TP 576 Identifying Food and Environmental Contaminants using the New NIST High-Res MS/MS Library Search Algorithms and Publicly Available LC/MS/MS Spectral Libraries Emma E Rennie1, Frank Kuhlmann1, James S Pyke1, Stephen Madden1 and O. David Sparkman2.…
Key words
search, searchlibrary, librarycrowd, crowddot, dotroc, roclibraries, librariesrev, revsourced, sourcednist, nisthram, hramhits, hitsdotprod, dotprodpublic, publicranked, rankedtpr
Other projects
GCMS
ICPMS
Follow us
FacebookX (Twitter)LinkedInYouTube
More information
WebinarsAbout usContact usTerms of use
LabRulez s.r.o. All rights reserved. Content available under a CC BY-SA 4.0 Attribution-ShareAlike