Validating and Comparing Component Detection Algorithms for LC-MS Data Assignment

Posters | 2015 | Thermo Fisher ScientificInstrumentation

Software, LC/MS

Industries

Manufacturer

Thermo Fisher Scientific

Summary

Importance of the Topic

The reliable detection of chromatographic features in liquid chromatography–mass spectrometry (LC-MS) is fundamental for metabolomics, drug metabolism studies and quality control in pharmaceutical and biological research. In practice, the performance of feature detection algorithms is limited by the scarcity of fully annotated benchmark datasets that include both true positive and true negative signals. An exhaustive annotation approach addresses this gap, enabling an objective stress test of algorithm accuracy across complex data.

Study Objectives and Overview

This work introduces a semi-automated annotation workflow and visualization tool (TotalRecall) to generate comprehensive LC-MS feature annotations. The objectives are to:

Create exhaustive lists of true positives (TP) and true negatives (TN) in diverse samples.
Compare feature detection algorithms under realistic conditions.
Demonstrate precision-recall analysis as a robust performance metric for imbalanced data.

Methodology

Three sample types were annotated: (1) a pure analyte spike (buspirone), (2) complex biological spiked mixtures (rat hepatocyte metabolite extracts and amino acid standards in positive and negative modes), and (3) a drug example lacking a targeted key (diclofenac). TotalRecall groups monoisotopic and isotopic signals into clusters, then classifies features as TP (validated A0 isotopes with consistent chromatographic shape) or TN (orphan isotopic signals or misaligned clusters). An exhaustive answer key is built via manual curation assisted by visualization tabs for clusters, TP, TN and unresolved (“Unknown”) features. Algorithm outputs are scored against these keys using varied signal‐intensity thresholds to generate precision-recall curves.

Instrumentation

Mass spectra were acquired on Thermo Scientific Exactive™ Orbitrap instruments, coupled to standard LC systems. Sample preparation involved spiking known concentrations into biological matrices and standard mixes.

Main Results and Discussion

Precision-recall analysis across datasets revealed that:

Algorithms achieve near-complete recall on simple spikes but exhibit high false positives when benchmarking against exhaustive annotations.
Complex samples (amino acids, buspirone metabolites, diclofenac) show clear differences among algorithms at lower intensity thresholds.
ROC curves undervalue performance gaps due to extreme class imbalance; precision-recall plots more clearly visualize trade-offs.

The exhaustive annotation approach exposes algorithm limitations that are masked by limited targeted keys.

Benefits and Practical Applications

By providing both TP and TN features, exhaustive annotation supports:

Objective benchmarking of feature detection and peak picking tools.
Evaluation of algorithm robustness in real-world complex matrices.
Optimization of threshold settings for high-throughput LC-MS data processing in metabolomics and pharmaceutical QC.

Future Trends and Possibilities

Potential extensions include:

Annotation of replicates and time‐course studies with automated feature tracking.
Benchmarking of isotope detection, adduct grouping, and full component detection workflows.
Integration with machine-learning models for automated curation and interactive interface enhancements.

Conclusion

This study demonstrates that semi-automated exhaustive annotation combined with precision-recall metrics provides a rigorous framework for evaluating LC-MS feature detection algorithms. The TotalRecall workflow uncovers performance differences in complex datasets, guiding method selection and development.

Reference

TotalRecall annotation tool and methodology presented in: Razumovskaya J., Brown J., Wright D., Baran R., Mohtashemi I. (2015) “Validating and Comparing Component Detection Algorithms for LC-MS Data Assignment.” Thermo Fisher Scientific.

Content was automatically generated from an orignal PDF document using AI and may contain inaccuracies.

Downloadable PDF for viewing

Similar PDF

Deep learning methods applied tothe analysis of metabolomics data

2019|Shimadzu|Posters

PO-CON1857E Deep learning methods applied to the analysis of metabolomics data ASMS 2019 WP 389 Shinji Kanazawa1,3,4, Yohei Yamada1, Hiroyuki Yasuda1, Akihiro Kunisawa1,3, Toru Shiohama1, Shigeki Kajihara1, Norio Mukai1, Masaki Kakisako2, Go Fujisawa2, Yuzuru Yamakage2, Junko Iida1,3, Eiichiro Fukusaki5, Fumio…

Key words

learning, learningdeep, deepmetabolomics, metabolomicsapplied, appliedrecall, recallmethods, methodsbaseline, baselinepeak, peakfalse, falsedata, datapositive, positiveaugmentation, augmentationtrue, truederivation, derivationalgorithms

De Novo PFAS Annotation and Classification Using Highly Accurate Formula Prediction and Kaufmann Algorithms Embedded in FluoroMatchSuite

2025|Agilent Technologies|Posters

Poster Reprint ASMS 2025 Poster number WP 113 De Novo PFAS Annotation and Classification Using Highly Accurate Formula Prediction and Kaufmann Algorithms Embedded in FluoroMatch Suite Jeremy Koelmel1; Michael Kummer2; David Schiessel2; Olivier Chevallier3; David Godri4; Christian Klein3; Emma E…

Key words

homologous, homologousvoting, votingfalse, falseformula, formulanist, nistseries, seriesprediction, predictionkaufmann, kaufmannfluoromatch, fluoromatchannotation, annotationvisualizations, visualizationsabc, abcrate, ratepicking, pickingfeatures

Leveraging the MS1 Dimension and Formula Prediction in Non-Targeted Analysis of PFAS using New FluoroMatch Algorithms: Assessing Confidence and Coverage

2024|Agilent Technologies|Posters

Leveraging the MS1 Dimension and Formula Prediction in Non-Targeted Analysis of PFAS using New FluoroMatch Algorithms: Assessing Confidence and Coverage David Schiessel* [1]; Jeremy Koelmel [2]; Michael Kummer [1]; David Godri [3]; Sheng Liu [2]; Elizabeth Z. Lin [2]; John…

Key words

fluoromatch, fluoromatchformula, formulaprediction, predictionhomologous, homologousisotopic, isotopicdefect, defectdda, ddaannotation, annotationvisualizations, visualizationsabc, abcpicking, pickingfeatures, featuresseries, seriesworkflow, workflowalongside

Comprehensive non-targeted workflow for confident identification of perfluoroalkyl substances (PFAS)

2025|Thermo Fisher Scientific|Applications

Application note | 003883 Environmental Comprehensive non-targeted workflow for confident identification of perfluoroalkyl substances (PFAS) Richard Cochran1, Sarah Choyke2, Application benefits Collin Meyers2, Ralf Tautenhahn3 • High-resolution accurate-mass (HRAM) data acquired using the Thermo Scientific™ Orbitrap Exploris™ mass spectrometer platform…

Key words

pfas, pfasannotation, annotationmass, massspectral, spectraldatabase, databasefluoromatch, fluoromatchworkflow, workflowdiscoverer, discovererduke, dukemzcloud, mzcloudconfidence, confidenceafff, affftargeted, targetedhomologous, homologouscompound