LCMS
More information
WebinarsAbout usContact usTerms of use
LabRulez s.r.o. All rights reserved. Content available under a CC BY-SA 4.0 Attribution-ShareAlike
Author
Organomation
Organomation
Organomation, founded in 1959, designs and manufactures high-quality nitrogen evaporators and extraction systems. Known for innovation and durability, their lab instruments are used globally for efficient sample preparation with strong customer support.
Tags
Interview
Science and research
Scientists
Video
LinkedIn Logo

The AI Revolution in Chromatography: Building Massive Retention Databases

Fr, 30.1.2026
| Original article from: Concentrating on Chromatography / David Oliva
Dwight Stoll explains why chromatography needs big, high-quality retention datasets for AI—how short columns boost throughput, why ratios matter, and why “drifty” phases threaten prediction models.
Video placeholder
  • Photo: Concentrating on Chromatography: The AI Revolution in Chromatography: Building Massive Retention Databases
  • Video: Concentrating on Chromatography: The AI Revolution in Chromatography: Building Massive Retention Databases

Transform Your Understanding of Modern Chromatography & AI!

Join us for an eye-opening conversation with Professor Dwight Stoll from Gustavus Adolphus College as he reveals how his team is revolutionizing liquid chromatography through high-throughput data collection and machine learning. With over 43,000 retention measurements and groundbreaking research spanning three major publications, Prof. Stoll is pioneering the future of analytical chemistry.

What You'll Learn:

  • How short columns (5mm vs 100mm) dramatically increase measurement throughput while maintaining accuracy
  • The evolution from HSM1 to HSM3 models and their predictive capabilities  
  • Why selectivity ratios are more stable than absolute retention factors
  • The "Manhattan Project of Chromatography" - building massive retention databases
  • Key challenges: column drift, mobile phase standardization, and precision trade-offs
  • Career advice for researchers at the intersection of chromatography and data science

Key Highlights:

  • 43,329 total retention measurements across 13 stationary phases
  • Revolutionary "feed injection" method for high-throughput analysis
  • Improved isomer selectivity predictions - crucial for pharmaceutical analysis
  • The holy grail: predicting retention from molecular structures

Perfect For:

  • Analytical chemists and chromatographers
  • Data scientists interested in chemistry applications
  • Graduate students and researchers
  • Pharmaceutical industry professionals
  • Anyone curious about AI applications in science

Connect with Prof. Stoll:

  • Gustavus Adolphus College Chemistry Department
  • Research focused on liquid chromatography and 2D-LC

Video Transcription

Why the chromatography community needs more retention data

Host: What inspired your team to double down on the idea that the chromatography community needs more retention data for machine learning and AI applications? Was there an “aha” moment?

Dwight Stoll: The roots go back maybe 20 years, starting with my involvement in the mid-2000s with work by Lloyd Snyder and John Dolan around the hydrophobic subtraction model of reversed-phase selectivity. At the time, I was a user, not deeply involved. After I moved to Gustavus in 2008, Lloyd and John approached me and said they were going to retire soon and didn’t want the project to die. They asked if I would carry it forward, which included continuing measurements to characterize new columns so data could go into the database. I said yes.

In the early 2010s we transitioned that characterization work to my lab. As I became the “face” of the model, people kept asking, “This model is nice, but can it predict anything?” For years I said that wasn’t really what it was built for. Then in the late 2010s I finally asked a student to help answer the question: how good is the model at predicting retention or selectivity for the probe molecules used in the characterization—only about 15 molecules.

We found it did pretty well for “plain vanilla” C18 phases, but it was really bad for non-C18 phases like PFPs, polar-embedded phases, and others. That wasn’t a fair criticism of the original model—it was built largely on C18 data—but it showed that if you wanted predictions beyond C18, you’d need something else.

So we took all the data collected over the previous ~15 years, refined or reparameterized the model, and saw meaningful improvement—especially fewer gross errors above about 10%. Over a few weeks it really clicked: we can do a lot better if we have more data that reflects the diversity of stationary phases in use. We called that refined model HSM2.

Then we had to confront the practical reality: if we need much more data, it either takes 20 years, or we find a way to generate it much faster. That’s when we started using very short columns to speed up measurements. And we thought: why haven’t we been doing this for the last 20 years?

AI in chromatography and the role of high-quality retention data

Host: How do you see AI transforming chromatography over the next 5–10 years, and what role does high-quality retention data play?

Dwight Stoll: It depends on what you mean by AI. One of the holy grails of chromatography is predicting retention from molecular structure. People spend enormous time and resources doing trial-and-error method development: “I have ten new molecules, I don’t know how to separate them—let’s try something and see.” What we’d rather do is feed structures into a model, get a strong starting point, then refine experimentally.

We’re already seeing the beginnings of this. There were papers in the 2010s with mixed success. From conversations with big pharma, they’re doing retention prediction at some level already and seeing business impact, so they’re investing heavily. In the past year or so there have been several impactful papers in this space. I think this is going to happen—whether you call it machine learning or AI.

The big questions are about scope and performance. For example, the HSM database has data for close to 800 columns, and there are roughly 1,300 commercially available reversed-phase columns. Will models predict across all columns, or only a subset? And if it’s a subset, who decides which ones? Another question: will models be open or closed, like what we see in generative AI? Also, what accuracy should users expect? Anyone can make a prediction—it doesn’t mean it’s useful. The real impact depends on accuracy, and then what kinds of chromatographic problems that accuracy enables you to solve. Some applications are forgiving; others are not.

We’ve also noticed limitations that I don’t see discussed much in the retention-prediction literature. We’ve thought about writing an opinion or white paper about what the community needs to fix if we want high accuracy. The path forward won’t be effortless.

Why very short columns can still be accurate

Host: Your approach uses very short columns. How do you maintain accuracy?

Dwight Stoll: Two primary things. First, when you put a very short column on an instrument designed for 50–150 mm columns, peak width and peak shape are dominated by the instrument, not the column. The peaks tend to be asymmetric—not textbook Gaussian peaks. So it becomes crucial to measure retention using the first statistical moment of the peak—the center of mass—rather than the apex time. If you use the apex on an asymmetric peak, you introduce significant error. Statistical moments have been used for decades, but in this context they’re especially important.

Second, we had to deal with frit volume. The time we observe includes transit through the frits and through the particle bed. What we care about is the particle bed, where the chemistry and separation happen. With normal columns, frit volume is negligible: a ~100 mm × 2.1 mm column is about 200 µL, and frits might be ~1 µL each—around 1% contribution. But our 5 mm × 2.1 mm columns are about ~10 µL. Two frits can contribute on the order of 20% of the volume, which can translate into comparable error if uncorrected.

We addressed this by using retention ratios rather than absolute retention times. We use toluene as a reference compound, and we record the ratio of the retention of the analyte to toluene. This essentially wipes out the frit-volume error. There may be other approaches, but this works well and is easy to implement.

Why retention ratios are better for building databases

Host: Why is this approach more robust for building large databases?

Dwight Stoll: We observed that selectivity—retention ratios—are much more stable over time than absolute retention factors. We don’t fully know why. It could be short-term temperature variations, or how consistently the pump prepares mobile phase day-to-day. But we need retention ratios anyway to address frit effects, and it turns out they’re more stable, which is a win for database building.

What HSM3 revealed from ~43,000 retention measurements

Host: Your HSM3 model advances previous hydrophobic subtraction models. What key insights came from having that much data?

Dwight Stoll: After HSM2, a company approached us about using HSM to track changes in selectivity over time—stationary phase degradation. We tried, and it didn’t work well. The model didn’t “see” changes even when separations clearly shifted. Part of the issue was that the original probe set had only one isomer pair, and it was relatively small. That led us to deliberately include several isomer pairs in the probe set for HSM3 and ask: does that improve predictive accuracy for isomer separations? The answer was convincingly yes.

That’s important because method development groups often say ~80% of separations are straightforward with their go-to C18. The hard ~20% frequently involve isomers. If we want predictive tools to be truly impactful, they need to handle isomer separations.

Another exciting insight is that models with interpretable terms—size, ionization interactions, etc.—might help explain why separations occur. If you show me two molecules that separate and ask why, I can speculate, but most of us are doing a lot of hand-waving. With these models, we can sometimes see that most terms cancel, and one or two terms drive selectivity. That could let us understand separations more deeply and design better methods and materials.

But we also learned about limitations. When you make measurements very fast with short columns, precision suffers. On modern instruments with normal columns, you might get 0.1% RSD in retention or better. With our high-throughput approach, we often see 1–2% variation day-to-day. That’s not ideal, but it’s the tradeoff for throughput. We think there’s a path to improve it, but it takes effort.

We also saw that some stationary phases drift over time—sometimes badly. If a column’s selectivity changes month-to-month, what do you put in the database? Data becomes a moving target. Manufacturers don’t label “drifty” columns in catalogs, so we often don’t know which are problematic. If we exclude drifting columns, we may reduce the chemical diversity of the dataset and narrow model scope. The other option is better materials. But that likely requires community-level prioritization, not just individual manufacturers acting alone. And I don’t see this drift issue discussed enough in current retention-prediction ML/AI papers, even though it’s real.

Advice for researchers combining chromatography and data science

Host: What advice would you give to young researchers interested in merging chromatography and data science?

Dwight Stoll: Most exciting progress happens at interfaces between fields. But to be effective at an interface, you need functional knowledge in both areas. You can be a chromatographer first and learn enough data science to work on data-heavy problems, or a data scientist first and learn enough chromatography to avoid producing nonsense—like negative retention factors. Without baseline understanding, you can waste a lot of time generating garbage.

A great example is Sarah Rutan. She was trained heavily in statistics but learned a lot of chromatography during her career, and she can speak both languages effectively. Work like HSM2 and HSM3 benefited greatly from that kind of dual fluency. The downside is it takes time. The upside is it makes you far more effective than others trying to work at the same interface.

Closing thoughts and community needs

Host: Was there anything we missed that you wanted to mention?

Dwight Stoll: Not specifically. I’d just emphasize that predicting retention from structure is one of the holy grails of chromatography. It won’t be simple or quick—high effort, high reward. No single group will crack it alone. We need lots of people contributing ideas and data.

A major focus now is harmonization: if we want models with 1–3% prediction error, the underlying data must be compatible at that level across labs and instruments. Mobile phase preparation is probably the biggest challenge—exact composition and exactly how it’s made. If you survey people, you get strong, conflicting opinions. But we’ll need community consensus to build truly high-quality shared datasets.

You can find people working on this at major conferences—HPLC, Pittcon, EAS, and others—where there are now dedicated sessions. If you’re interested, get involved in the conversation.

This text has been automatically transcribed from a video presentation using AI technology. It may contain inaccuracies and is not guaranteed to be 100% correct.

Concentrating on Chromatography Podcast

Dive into the frontiers of chromatography, mass spectrometry, and sample preparation with host David Oliva. Each episode features candid conversations with leading researchers, industry innovators, and passionate scientists who are shaping the future of analytical chemistry. From decoding PFAS detection challenges to exploring the latest in AI-assisted liquid chromatography, this show uncovers practical workflows, sustainability breakthroughs, and the real-world impact of separation science. Whether you’re a chromatographer, lab professional, or researcher you'll discover inspiring content!

You can find Concentrating on Chromatography Podcast in podcast apps:

and on YouTube channel

Organomation
LinkedIn Logo
 

Related content

Overcoming Strong Solvent Effects in the Analysis of Vepdegestrant

Applications
| 2026 | Agilent Technologies
Instrumentation
HPLC
Manufacturer
Agilent Technologies
Industries
Pharma & Biopharma

Identification of Double Bond Positions and Relative Acyl Chain Positions in Egg Yolk Phosphatidylcholines Using OAD-TOF System

Applications
| 2026 | Shimadzu
Instrumentation
LC/MS, LC/MS/MS, LC/TOF, LC/HRMS
Manufacturer
Shimadzu
Industries
Food & Agriculture

High Molecular-Weight Polysaccharide Characterization by SEC-MALS Using GTxResolve™ 1000 and 2000 Å SEC Columns

Applications
| 2026 | Waters
Instrumentation
GPC/SEC, Consumables, LC columns
Manufacturer
Waters
Industries
Pharma & Biopharma, Food & Agriculture

Development and Optimization for a Comprehensive LC/MS/MS Method for the Detection of 74 PFAS Compounds

Applications
| 2026 | Agilent Technologies
Instrumentation
LC/MS, LC/MS/MS, LC/QQQ
Manufacturer
Agilent Technologies
Industries
Food & Agriculture

PFAS in Biota: Risk Context & Robust Analytical Solutions

Others
| 2026 | ALS Europe
Instrumentation
Laboratory analysis, LC/MS, LC/MS/MS
Manufacturer
Industries
Environmental
 

Related articles

Unlocking Hemoglobin Variant Detection: LC-HR-MS Breakthrough with Ainslie Chen
Interview | Science and research

Unlocking Hemoglobin Variant Detection: LC-HR-MS Breakthrough with Ainslie Chen

Interview with Ainslie Chen on using LC-HRMS and C4 reversed-phase chromatography for precise hemoglobin variant detection and future clinical proteomics applications.
Organomation
tag
share
more
Webinars LabRulezLCMS Week 08/2026
Article | Webinars

Webinars LabRulezLCMS Week 08/2026

11 webinars: AAV genome analysis, amino acid panel, capillary LC, data integrity, InfinityLab Assist, LC-MS proteomics, Orbitrap Astral, PFAS testing, peptide mapping, USP <621>
LabRulez
tag
share
more
Imaging of Protein Assemblies up to 231 kDa in Tissues with Nano-DESI Mass Spectrometry
Scientific article | Science and research

Imaging of Protein Assemblies up to 231 kDa in Tissues with Nano-DESI Mass Spectrometry

This study extends nano-DESI mass spectrometry imaging to intact protein assemblies up to 231 kDa, enabling direct identification of large complexes in tissue.
LabRulez
tag
share
more
Unlocking Phytoplankton Metallomes with Comparative Analysis of Metal Quotas, Quantitative Proteomics, and Inferred Metalloproteomes
Scientific article | Science and research

Unlocking Phytoplankton Metallomes with Comparative Analysis of Metal Quotas, Quantitative Proteomics, and Inferred Metalloproteomes

This study integrates metal quotas, quantitative proteomics, and inferred metalloproteomes to reveal trace metal requirements and adaptation strategies in phytoplankton.
LabRulez
tag
share
more
 

Related content

Overcoming Strong Solvent Effects in the Analysis of Vepdegestrant

Applications
| 2026 | Agilent Technologies
Instrumentation
HPLC
Manufacturer
Agilent Technologies
Industries
Pharma & Biopharma

Identification of Double Bond Positions and Relative Acyl Chain Positions in Egg Yolk Phosphatidylcholines Using OAD-TOF System

Applications
| 2026 | Shimadzu
Instrumentation
LC/MS, LC/MS/MS, LC/TOF, LC/HRMS
Manufacturer
Shimadzu
Industries
Food & Agriculture

High Molecular-Weight Polysaccharide Characterization by SEC-MALS Using GTxResolve™ 1000 and 2000 Å SEC Columns

Applications
| 2026 | Waters
Instrumentation
GPC/SEC, Consumables, LC columns
Manufacturer
Waters
Industries
Pharma & Biopharma, Food & Agriculture

Development and Optimization for a Comprehensive LC/MS/MS Method for the Detection of 74 PFAS Compounds

Applications
| 2026 | Agilent Technologies
Instrumentation
LC/MS, LC/MS/MS, LC/QQQ
Manufacturer
Agilent Technologies
Industries
Food & Agriculture

PFAS in Biota: Risk Context & Robust Analytical Solutions

Others
| 2026 | ALS Europe
Instrumentation
Laboratory analysis, LC/MS, LC/MS/MS
Manufacturer
Industries
Environmental
 

Related articles

Unlocking Hemoglobin Variant Detection: LC-HR-MS Breakthrough with Ainslie Chen
Interview | Science and research

Unlocking Hemoglobin Variant Detection: LC-HR-MS Breakthrough with Ainslie Chen

Interview with Ainslie Chen on using LC-HRMS and C4 reversed-phase chromatography for precise hemoglobin variant detection and future clinical proteomics applications.
Organomation
tag
share
more
Webinars LabRulezLCMS Week 08/2026
Article | Webinars

Webinars LabRulezLCMS Week 08/2026

11 webinars: AAV genome analysis, amino acid panel, capillary LC, data integrity, InfinityLab Assist, LC-MS proteomics, Orbitrap Astral, PFAS testing, peptide mapping, USP <621>
LabRulez
tag
share
more
Imaging of Protein Assemblies up to 231 kDa in Tissues with Nano-DESI Mass Spectrometry
Scientific article | Science and research

Imaging of Protein Assemblies up to 231 kDa in Tissues with Nano-DESI Mass Spectrometry

This study extends nano-DESI mass spectrometry imaging to intact protein assemblies up to 231 kDa, enabling direct identification of large complexes in tissue.
LabRulez
tag
share
more
Unlocking Phytoplankton Metallomes with Comparative Analysis of Metal Quotas, Quantitative Proteomics, and Inferred Metalloproteomes
Scientific article | Science and research

Unlocking Phytoplankton Metallomes with Comparative Analysis of Metal Quotas, Quantitative Proteomics, and Inferred Metalloproteomes

This study integrates metal quotas, quantitative proteomics, and inferred metalloproteomes to reveal trace metal requirements and adaptation strategies in phytoplankton.
LabRulez
tag
share
more
 

Related content

Overcoming Strong Solvent Effects in the Analysis of Vepdegestrant

Applications
| 2026 | Agilent Technologies
Instrumentation
HPLC
Manufacturer
Agilent Technologies
Industries
Pharma & Biopharma

Identification of Double Bond Positions and Relative Acyl Chain Positions in Egg Yolk Phosphatidylcholines Using OAD-TOF System

Applications
| 2026 | Shimadzu
Instrumentation
LC/MS, LC/MS/MS, LC/TOF, LC/HRMS
Manufacturer
Shimadzu
Industries
Food & Agriculture

High Molecular-Weight Polysaccharide Characterization by SEC-MALS Using GTxResolve™ 1000 and 2000 Å SEC Columns

Applications
| 2026 | Waters
Instrumentation
GPC/SEC, Consumables, LC columns
Manufacturer
Waters
Industries
Pharma & Biopharma, Food & Agriculture

Development and Optimization for a Comprehensive LC/MS/MS Method for the Detection of 74 PFAS Compounds

Applications
| 2026 | Agilent Technologies
Instrumentation
LC/MS, LC/MS/MS, LC/QQQ
Manufacturer
Agilent Technologies
Industries
Food & Agriculture

PFAS in Biota: Risk Context & Robust Analytical Solutions

Others
| 2026 | ALS Europe
Instrumentation
Laboratory analysis, LC/MS, LC/MS/MS
Manufacturer
Industries
Environmental
 

Related articles

Unlocking Hemoglobin Variant Detection: LC-HR-MS Breakthrough with Ainslie Chen
Interview | Science and research

Unlocking Hemoglobin Variant Detection: LC-HR-MS Breakthrough with Ainslie Chen

Interview with Ainslie Chen on using LC-HRMS and C4 reversed-phase chromatography for precise hemoglobin variant detection and future clinical proteomics applications.
Organomation
tag
share
more
Webinars LabRulezLCMS Week 08/2026
Article | Webinars

Webinars LabRulezLCMS Week 08/2026

11 webinars: AAV genome analysis, amino acid panel, capillary LC, data integrity, InfinityLab Assist, LC-MS proteomics, Orbitrap Astral, PFAS testing, peptide mapping, USP <621>
LabRulez
tag
share
more
Imaging of Protein Assemblies up to 231 kDa in Tissues with Nano-DESI Mass Spectrometry
Scientific article | Science and research

Imaging of Protein Assemblies up to 231 kDa in Tissues with Nano-DESI Mass Spectrometry

This study extends nano-DESI mass spectrometry imaging to intact protein assemblies up to 231 kDa, enabling direct identification of large complexes in tissue.
LabRulez
tag
share
more
Unlocking Phytoplankton Metallomes with Comparative Analysis of Metal Quotas, Quantitative Proteomics, and Inferred Metalloproteomes
Scientific article | Science and research

Unlocking Phytoplankton Metallomes with Comparative Analysis of Metal Quotas, Quantitative Proteomics, and Inferred Metalloproteomes

This study integrates metal quotas, quantitative proteomics, and inferred metalloproteomes to reveal trace metal requirements and adaptation strategies in phytoplankton.
LabRulez
tag
share
more
 

Related content

Overcoming Strong Solvent Effects in the Analysis of Vepdegestrant

Applications
| 2026 | Agilent Technologies
Instrumentation
HPLC
Manufacturer
Agilent Technologies
Industries
Pharma & Biopharma

Identification of Double Bond Positions and Relative Acyl Chain Positions in Egg Yolk Phosphatidylcholines Using OAD-TOF System

Applications
| 2026 | Shimadzu
Instrumentation
LC/MS, LC/MS/MS, LC/TOF, LC/HRMS
Manufacturer
Shimadzu
Industries
Food & Agriculture

High Molecular-Weight Polysaccharide Characterization by SEC-MALS Using GTxResolve™ 1000 and 2000 Å SEC Columns

Applications
| 2026 | Waters
Instrumentation
GPC/SEC, Consumables, LC columns
Manufacturer
Waters
Industries
Pharma & Biopharma, Food & Agriculture

Development and Optimization for a Comprehensive LC/MS/MS Method for the Detection of 74 PFAS Compounds

Applications
| 2026 | Agilent Technologies
Instrumentation
LC/MS, LC/MS/MS, LC/QQQ
Manufacturer
Agilent Technologies
Industries
Food & Agriculture

PFAS in Biota: Risk Context & Robust Analytical Solutions

Others
| 2026 | ALS Europe
Instrumentation
Laboratory analysis, LC/MS, LC/MS/MS
Manufacturer
Industries
Environmental
 

Related articles

Unlocking Hemoglobin Variant Detection: LC-HR-MS Breakthrough with Ainslie Chen
Interview | Science and research

Unlocking Hemoglobin Variant Detection: LC-HR-MS Breakthrough with Ainslie Chen

Interview with Ainslie Chen on using LC-HRMS and C4 reversed-phase chromatography for precise hemoglobin variant detection and future clinical proteomics applications.
Organomation
tag
share
more
Webinars LabRulezLCMS Week 08/2026
Article | Webinars

Webinars LabRulezLCMS Week 08/2026

11 webinars: AAV genome analysis, amino acid panel, capillary LC, data integrity, InfinityLab Assist, LC-MS proteomics, Orbitrap Astral, PFAS testing, peptide mapping, USP <621>
LabRulez
tag
share
more
Imaging of Protein Assemblies up to 231 kDa in Tissues with Nano-DESI Mass Spectrometry
Scientific article | Science and research

Imaging of Protein Assemblies up to 231 kDa in Tissues with Nano-DESI Mass Spectrometry

This study extends nano-DESI mass spectrometry imaging to intact protein assemblies up to 231 kDa, enabling direct identification of large complexes in tissue.
LabRulez
tag
share
more
Unlocking Phytoplankton Metallomes with Comparative Analysis of Metal Quotas, Quantitative Proteomics, and Inferred Metalloproteomes
Scientific article | Science and research

Unlocking Phytoplankton Metallomes with Comparative Analysis of Metal Quotas, Quantitative Proteomics, and Inferred Metalloproteomes

This study integrates metal quotas, quantitative proteomics, and inferred metalloproteomes to reveal trace metal requirements and adaptation strategies in phytoplankton.
LabRulez
tag
share
more
Other projects
GCMS
ICPMS
Follow us
More information
WebinarsAbout usContact usTerms of use
LabRulez s.r.o. All rights reserved. Content available under a CC BY-SA 4.0 Attribution-ShareAlike