LCMS
More information
WebinarsAbout usContact usTerms of use
LabRulez s.r.o. All rights reserved. Content available under a CC BY-SA 4.0 Attribution-ShareAlike
Author
LabRulez
LabRulez
Everything from the world of analytical chemistry in one place. We connect people in solving their problems. At Labrulez you will find all the necessary information easily, quickly and clearly. Stop searching and start finding.
Tags
Scientific article
Science and research
LinkedIn Logo

Best practices and tools in R and Python for statistical processing and visualization of lipidomics and metabolomics data

Mo, 6.10.2025
| Original article from: Nat Commun 16, 8714 (2025)
The review compiles R and Python tools for analyzing lipidomics and metabolomics data, guiding beginners to create publication-ready graphics and enabling robust, reproducible chemometric analysis.
<p>Nat Commun 16, 8714 (2025): Fig. 1: Data transformation and scaling.</p>

Nat Commun 16, 8714 (2025): Fig. 1: Data transformation and scaling.

Mass spectrometry-based lipidomics and metabolomics generate large, complex datasets requiring advanced skills for statistical analysis and visualization. This review compiles freely accessible R and Python tools to help researchers explore data, identify trends, and visualize biologically relevant differences.

The article guides beginners through descriptive statistics, hypothesis testing, volcano plots, lipid maps, dimensionality reduction, heat maps, and more. A companion GitBook provides step-by-step coding instructions, enabling users to produce publication-ready graphics with minimal complexity. Together, this review and resource library promote robust, reproducible chemometric analysis of omics data using open-source software.

The original article

Best practices and tools in R and Python for statistical processing and visualization of lipidomics and metabolomics data

Jakub Idkowiak, Jonas Dehairs, Jana Schwarzerová, Dominika Olešová, Jacob X. M. Truong, Aleš Kvasnička, Marios Eftychiou, Ruben Cools, Xander Spotbeen, Robert Jirásko, Vullnet Veseli, Marco Giampà, Vincent de Laat, Lisa M. Butler, Wolfram Weckwerth, David Friedecký, Jonas Demeulemeester, Karel Hron, Johannes V. Swinnen & Michal Holčapek 

Nat Commun 16, 8714 (2025)

https://doi.org/10.1038/s41467-025-63751-1

licensed under CC-BY 4.0

Selected sections from the article follow. Formats and hyperlinks were adapted from the original.

Introduction

Advances in mass spectrometry and chromatography have boosted the fields of biomedical and clinical lipidomics and metabolomics, with vast volumes of data being generated each day. The primary focus of biomedical/clinical lipidomics and metabolomics studies is to investigate the biological variation reflected by different lipid or metabolite levels between analyzed groups1,2. However, lipid or metabolite concentrations in biological materials can be influenced by other factors, including culture and growing conditions3,4, age5,6, sex5,6,7, dietary habits8,9, smoking and drinking status6,10, medications11,12, circadian rhythm1,13,14,15, or other comorbidities6,12,16,17,18,19,20,21,22. Data also exhibit unwanted variation that can arise at multiple steps during the experiment23,24,25. Extracting the biological variation without applying the right procedures is complicated, leading to ambiguous or erroneous results. Therefore, investigators collaborating within the International Lipidomics Society and the Metabolomics Society have created guidelines for performing lipidomics and metabolomics experiments to improve the quality of quantitative omics data, unifying experimental protocols, and standardizing data reporting26,27,28,29,30,31,32. The variability of lipidomics and metabolomics data can be reduced by following these recommendations.

A typical output of quantitative measurements is a table filled with lipid or metabolite concentrations measured across samples (observations). Usually, the number of variables/features (lipids or metabolites) quantified in biological materials exceeds the number of samples measured33. Lipidomics and metabolomics tables often contain missing values34,35,36,37,38 and outliers39. As a result, the concentration distributions for biological groups often deviate from a symmetric Gaussian distribution, exhibiting left- or right-skewed patterns, with the latter being usually more common. Lipidomics and metabolomics data are also characterized by heteroscedasticity, which means that the spread of variable values within examined biological groups may not be comparable. Their concentrations can differ by orders of magnitude even within the same biological class of compounds. However, more abundant molecules may not necessarily be more important from the biological point of view. The magnitude of alterations in metabolite and lipid levels may also differ. Molecules involved in the tightly controlled central metabolism are less prone to changes than those in the secondary metabolism40. Concentrations of molecules from the same subclass, class, or closely related metabolic pathways are likely to be correlated41. Computed concentration values are affected by batch effects resulting from fluctuations in the instrument’s response during the sample sequence. To address this, the standardized datasets contain additional quality control (QC) samples2. QCs can be obtained simply by pooling small aliquots of all biological samples42 or purchased, e.g., National Institute of Standards and Technology (NIST) standard reference material (SRM) 195043 for metabolomics/lipidomics of plasma samples. Using QCs and blanks allows for evaluating the quality of the obtained data, provides insight into technical variability42, and is instrumental for normalization (e.g., removal of batch effects)2,44.

Analyzing these complex data, scientists must acquire statistical, computational, and data visualization skills to gain insights into statistically significant trends and relevant relationships hidden in their datasets, being aware of their specific properties. Advancing knowledge in statistics and programming is a demanding task with many hurdles, particularly when transitioning from a graphical user interface (GUI) to a text editor. Therefore, web-based, user-friendly tools have been developed to facilitate data exploration, e.g., the MetaboAnalyst platform45, LipidSig46, LipidSuite47, LipidMaps Statistical Analysis Tool48, LipidomicsR, or COVAIN49. When using these platforms, the user is guided through a simple chemometric pipeline, from uploading datasets to extracting and visualizing the most significant information. User decisions are translated into code that ultimately triggers mathematical operations, simplifying the data mining. The novel Shiny app ADViSELipidomics has also been introduced, covering preprocessing, analyzing, and visualizing lipidomics data50. Although these solutions suit novices in statistics and chemometrics, more experienced users demand more flexibility, particularly in visualization. Complex lipidomics and metabolomics datasets can be visualized in various ways. Lipidomic data can be grouped based on common characteristics, such as lipid subclass, fatty acid composition, saturation, or a number of aggregate carbons. Generating informative figures can be facilitated by at least basic R or Python scripting skills.

This manuscript is structured in three parts: (i) data preparation for statistical analysis, (ii) an overview and critical review of key statistical methods and visualizations applied in lipidomics and metabolomics – to build a solid understanding of the analyses, (iii) a beginner’s guide dedicated to those who want to use R and Python for statistical analysis and visualization of clinical lipidomics and metabolomics data. We also provide a GitBook code repository containing scripts and step-by-step notebooks to support the readers’ first steps with R or Python.

Overview of key statistical methods and visualizations applied in lipidomics and metabolomics

Methods for data exploration

Methods for data exploration can be characterized as univariate (considering one variable at a time) or multivariate (examining multiple variables simultaneously)64. Data exploration begins with the preparation of descriptive statistics. Although univariate methods can also be used in this step, it is important to keep in mind that omics data are, in essence, multivariate.

Descriptive statistics

Descriptive/Summary statistics summarize the basic properties of the dataset. At this step, measures of central tendency are estimated, so-called location parameters, which refer to a typical lipid or metabolite concentration value for each biological group, a center of each distribution. Depending on the shape of a distribution, the most typical value within a biological group can be reflected by the mean (symmetric distributions only) or the median and the mode (better for skewed distributions) (Fig. 2)65,66,67. The typical value representing a biological group is usually presented together with a measure of dispersion, for example, standard deviation (SD), variance, range, interquartile ranges (IQR)65,67, or the coefficient of variation (SD relative to mean)67,68. Quartiles (25% percentiles) split the data into four equal parts (Q1-4) after listing them in ascending order. Deciles (10% percentiles - p10, p20, p30, etc.) are computed similarly to quartiles and split data into ten equal parts67. Summary statistics can also contain information on contingency tables (for presence-absence analysis) and parameters characterizing the distribution shape, like skewness and kurtosis67. This analysis allows the detection of potential outliers and implies sample distribution properties. The initial investigation of relationships among variable concentrations is also a part of summary statistics. Here, covariance can be applied to indicate how and in what direction concentrations of two lipids or metabolites change together. Correlation analysis is often performed in lipidomics and metabolomics, which measures the direction and strength of a relationship. Correlation ranges between –1 and 1, indicating a strong negative or positive linear relationship, respectively, while a correlation close to 0 indicates no linear relationship exists between two concentrations67,69. The Pearson correlation can be calculated for normally distributed samples of populations, but it is sensitive to outliers. Instead, Spearman’s rank correlation should be used, also for skewed distributions67,70.

Nat Commun 16, 8714 (2025): Fig. 2 - Components and construction of box plots.Nat Commun 16, 8714 (2025): Fig. 2 - Components and construction of box plots.

Univariate statistical methods

Fold change

After measures of central tendency are calculated for each group, a fold change can be computed. Fold change is the ratio of two mean concentrations (usually presented as log2 or log10 value) of lipids/metabolites in a condition related to the control condition. The ratios of medians or modes can be used for skewed distributions.

Statistical tests for comparing two biological groups

Statistical tests are broadly applied to compare outcomes between biological groups, e.g., differences in mean concentrations of lipids/metabolites in the control and disease groups. The tests can be parametric or non-parametric. The former covers all tests that make assumptions about the distribution from which the sample data is drawn. The latter can be considered distribution-free tests and are not restricted by assumptions on the nature of the sampled population42,69.

The t-test is a parametric test to compare the location (i.e., mean) of two random samples of continuous variables. If two samples are independent, the unpaired t-test is used. In contrast, if measurements involve, e.g., subjects pre- and post-intervention, a paired t-test is applied42,69,75. While the latter assumes that differences between pairs of values are approximately normally distributed, the former requires the sampled concentration of both variables to come from the normal distribution with the same spread69,75 (variance, testable using an F-test, for example) (Fig. 3A). Welch’s t-test is used if the variances of the two groups are different (Fig. 3A)70,75. The t-statistic determines the test outcome, i.e., whether to reject or not the null hypothesis. This statistic is a scaled difference between the sample-estimated means, which, under the null hypothesis, follows a Student’s t-distribution (akin to the normal distribution but with heavier tails; Fig. 3C, D – example 1). The null hypothesis can be that one mean is greater (or smaller) than the other (one-sided test – only the probability mass in one tail of the t-distribution is assessed) or test whether either is true (i.e., the means are not equal; two-sided test – both tails are considered) (Fig. 3D – example 1). Importantly, at the same significance level α, the one-sided test is more sensitive69. However, as the direction of differences is often unknown, the two-sided t-test is generally more suitable for metabolomics and lipidomics.

Nat Commun 16, 8714 (2025): Fig. 3 - Statistical tests commonly used in lipidomics and metabolomics.Nat Commun 16, 8714 (2025): Fig. 3 - Statistical tests commonly used in lipidomics and metabolomics.

Conclusions and perspectives

This review aims to bridge the gap between theory and application, offering a comprehensive understanding and allowing effective utilization of key statistical methods in lipidomics and metabolomics. By providing access to a range of R and Python tools via a GitBook repository, it equips researchers with practical resources to begin using these programming languages for statistical data analysis.

There is a noticeable trend toward making R and Python tools more accessible for beginners, evident in the development of libraries such as ggpubr, ggstatsplot, or tidyplots for R, as well as seaborn for Python. These libraries allow users to perform advanced data analysis or visualization with just one function. Simultaneously, more advanced R users rely on access to comprehensive modular solutions (all-in-one collections), where often an R object is initially created and then downstream processed through a series of libraries that streamline each step of data processing and mining. For instance, capable R users can process a variety of raw mass spectrometry data with tidyMass116, RforMassSpectrometry, MetaboAnalystR117, or well-developed xcms package118,119,120. Then, perform initial analysis and visualization within these libraries, and finally create sophisticated, publication-ready, high-quality graphics using tidyverse, ggpubr, plotly, or specialized -omics libraries like lipidr121 and LipidSigR46. Similarly, once trained, Python users can employ all-encompassing tidyMS122 or OpenMS123 for data processing and seaborn or matplotlib for visualization. Utilizing open-source tools provides -omics scientists with greater flexibility and a broader array of solutions, as seen in projects and toolboxes like metaRbolomics within Bioconductor124. This approach also decreases dependence on costly vendor software and supports scalable, reproducible, and standardized workflows.

Moreover, proficiency in these programming languages fosters adaptability in tackling emerging challenges in metabolomics and lipidomics research. By developing these skills, researchers can enhance their own analyses and extract deeper insights from complex omics data. This, in turn, drives advancements, for instance, in biomarker discovery, disease mechanisms, or personalized medicine within clinical and biomedical sciences.

LabRulez
LinkedIn Logo
 

Related content

Sensitive and selective quantitation of bile acids using targeted MS2/MS3 on the Stellar mass spectrometer

Posters
| 2025 | Thermo Fisher Scientific (MSACL)
Instrumentation
LC/MS, LC/MS/MS, LC/Orbitrap, LC/HRMS
Manufacturer
Thermo Fisher Scientific
Industries
Clinical Research

Selective and sensitive measurement of 17 steroids in human serum using a Stellar mass spectrometer

Technical notes
| 2025 | Thermo Fisher Scientific
Instrumentation
LC/MS, LC/MS/MS, LC/Orbitrap, LC/HRMS
Manufacturer
Thermo Fisher Scientific
Industries
Clinical Research

Automated Multiple Reaction Monitoring (MRM) Method Development for Peptide Drugs Using waters_connect for Quantitation Software

Applications
| 2025 | Waters
Instrumentation
Software, LC/MS, LC/MS/MS, LC/QQQ
Manufacturer
Waters
Industries
Pharma & Biopharma

Constant Neutral Loss, Precursor Ion Scanning, Product Ion Scanning in Support of DMPK Studies Using waters_connect for Quantitation Software Solution and Xevo TQ Absolute XR Mass Spectrometer

Applications
| 2025 | Waters
Instrumentation
LC/MS, LC/MS/MS, LC/QQQ, Software
Manufacturer
Waters
Industries
Pharma & Biopharma, Clinical Research, Metabolomics

pH gradient Analysis of Infliximab charge variant

Applications
| 2025 | Shimadzu
Instrumentation
HPLC, Consumables, LC columns
Manufacturer
Shimadzu
Industries
Pharma & Biopharma
 

Related articles

How to recalibrate a level in the calibration in Clarity Chromatography Software
Article | Product

How to recalibrate a level in the calibration in Clarity Chromatography Software

Learn how to correct faulty calibration levels in Clarity software. This guide explains when to remove a point, when to recalibrate, and how to keep calibration curves accurate and reliable.
DataApex
tag
share
more
Work Smarter, Review Faster with Chromeleon CDS
Article | Product

Work Smarter, Review Faster with Chromeleon CDS

Discover how Chromeleon CDS View Settings streamline data processing and review, improve SOP compliance, reduce errors, and boost productivity across analytical labs.
Thermo Fisher Scientific
tag
share
more
What is the most appropriate way to approach scaling up your analytical method when transferring it to preparative chromatography scale? Part 1
Article | Academy

What is the most appropriate way to approach scaling up your analytical method when transferring it to preparative chromatography scale? Part 1

Learn how to balance purity, yield, and throughput when scaling analytical HPLC methods to preparative chromatography.
Phenomenex
tag
share
more
News from LabRulezLCMS Library - Week 47, 2025
Article | Application

News from LabRulezLCMS Library - Week 47, 2025

This week we bring you application notes by Agilent Technologies, Shimadzu and Waters Corporation and poster by Thermo Fisher Scientific / HPLC!
LabRulez
tag
share
more
 

Related content

Sensitive and selective quantitation of bile acids using targeted MS2/MS3 on the Stellar mass spectrometer

Posters
| 2025 | Thermo Fisher Scientific (MSACL)
Instrumentation
LC/MS, LC/MS/MS, LC/Orbitrap, LC/HRMS
Manufacturer
Thermo Fisher Scientific
Industries
Clinical Research

Selective and sensitive measurement of 17 steroids in human serum using a Stellar mass spectrometer

Technical notes
| 2025 | Thermo Fisher Scientific
Instrumentation
LC/MS, LC/MS/MS, LC/Orbitrap, LC/HRMS
Manufacturer
Thermo Fisher Scientific
Industries
Clinical Research

Automated Multiple Reaction Monitoring (MRM) Method Development for Peptide Drugs Using waters_connect for Quantitation Software

Applications
| 2025 | Waters
Instrumentation
Software, LC/MS, LC/MS/MS, LC/QQQ
Manufacturer
Waters
Industries
Pharma & Biopharma

Constant Neutral Loss, Precursor Ion Scanning, Product Ion Scanning in Support of DMPK Studies Using waters_connect for Quantitation Software Solution and Xevo TQ Absolute XR Mass Spectrometer

Applications
| 2025 | Waters
Instrumentation
LC/MS, LC/MS/MS, LC/QQQ, Software
Manufacturer
Waters
Industries
Pharma & Biopharma, Clinical Research, Metabolomics

pH gradient Analysis of Infliximab charge variant

Applications
| 2025 | Shimadzu
Instrumentation
HPLC, Consumables, LC columns
Manufacturer
Shimadzu
Industries
Pharma & Biopharma
 

Related articles

How to recalibrate a level in the calibration in Clarity Chromatography Software
Article | Product

How to recalibrate a level in the calibration in Clarity Chromatography Software

Learn how to correct faulty calibration levels in Clarity software. This guide explains when to remove a point, when to recalibrate, and how to keep calibration curves accurate and reliable.
DataApex
tag
share
more
Work Smarter, Review Faster with Chromeleon CDS
Article | Product

Work Smarter, Review Faster with Chromeleon CDS

Discover how Chromeleon CDS View Settings streamline data processing and review, improve SOP compliance, reduce errors, and boost productivity across analytical labs.
Thermo Fisher Scientific
tag
share
more
What is the most appropriate way to approach scaling up your analytical method when transferring it to preparative chromatography scale? Part 1
Article | Academy

What is the most appropriate way to approach scaling up your analytical method when transferring it to preparative chromatography scale? Part 1

Learn how to balance purity, yield, and throughput when scaling analytical HPLC methods to preparative chromatography.
Phenomenex
tag
share
more
News from LabRulezLCMS Library - Week 47, 2025
Article | Application

News from LabRulezLCMS Library - Week 47, 2025

This week we bring you application notes by Agilent Technologies, Shimadzu and Waters Corporation and poster by Thermo Fisher Scientific / HPLC!
LabRulez
tag
share
more
 

Related content

Sensitive and selective quantitation of bile acids using targeted MS2/MS3 on the Stellar mass spectrometer

Posters
| 2025 | Thermo Fisher Scientific (MSACL)
Instrumentation
LC/MS, LC/MS/MS, LC/Orbitrap, LC/HRMS
Manufacturer
Thermo Fisher Scientific
Industries
Clinical Research

Selective and sensitive measurement of 17 steroids in human serum using a Stellar mass spectrometer

Technical notes
| 2025 | Thermo Fisher Scientific
Instrumentation
LC/MS, LC/MS/MS, LC/Orbitrap, LC/HRMS
Manufacturer
Thermo Fisher Scientific
Industries
Clinical Research

Automated Multiple Reaction Monitoring (MRM) Method Development for Peptide Drugs Using waters_connect for Quantitation Software

Applications
| 2025 | Waters
Instrumentation
Software, LC/MS, LC/MS/MS, LC/QQQ
Manufacturer
Waters
Industries
Pharma & Biopharma

Constant Neutral Loss, Precursor Ion Scanning, Product Ion Scanning in Support of DMPK Studies Using waters_connect for Quantitation Software Solution and Xevo TQ Absolute XR Mass Spectrometer

Applications
| 2025 | Waters
Instrumentation
LC/MS, LC/MS/MS, LC/QQQ, Software
Manufacturer
Waters
Industries
Pharma & Biopharma, Clinical Research, Metabolomics

pH gradient Analysis of Infliximab charge variant

Applications
| 2025 | Shimadzu
Instrumentation
HPLC, Consumables, LC columns
Manufacturer
Shimadzu
Industries
Pharma & Biopharma
 

Related articles

How to recalibrate a level in the calibration in Clarity Chromatography Software
Article | Product

How to recalibrate a level in the calibration in Clarity Chromatography Software

Learn how to correct faulty calibration levels in Clarity software. This guide explains when to remove a point, when to recalibrate, and how to keep calibration curves accurate and reliable.
DataApex
tag
share
more
Work Smarter, Review Faster with Chromeleon CDS
Article | Product

Work Smarter, Review Faster with Chromeleon CDS

Discover how Chromeleon CDS View Settings streamline data processing and review, improve SOP compliance, reduce errors, and boost productivity across analytical labs.
Thermo Fisher Scientific
tag
share
more
What is the most appropriate way to approach scaling up your analytical method when transferring it to preparative chromatography scale? Part 1
Article | Academy

What is the most appropriate way to approach scaling up your analytical method when transferring it to preparative chromatography scale? Part 1

Learn how to balance purity, yield, and throughput when scaling analytical HPLC methods to preparative chromatography.
Phenomenex
tag
share
more
News from LabRulezLCMS Library - Week 47, 2025
Article | Application

News from LabRulezLCMS Library - Week 47, 2025

This week we bring you application notes by Agilent Technologies, Shimadzu and Waters Corporation and poster by Thermo Fisher Scientific / HPLC!
LabRulez
tag
share
more
 

Related content

Sensitive and selective quantitation of bile acids using targeted MS2/MS3 on the Stellar mass spectrometer

Posters
| 2025 | Thermo Fisher Scientific (MSACL)
Instrumentation
LC/MS, LC/MS/MS, LC/Orbitrap, LC/HRMS
Manufacturer
Thermo Fisher Scientific
Industries
Clinical Research

Selective and sensitive measurement of 17 steroids in human serum using a Stellar mass spectrometer

Technical notes
| 2025 | Thermo Fisher Scientific
Instrumentation
LC/MS, LC/MS/MS, LC/Orbitrap, LC/HRMS
Manufacturer
Thermo Fisher Scientific
Industries
Clinical Research

Automated Multiple Reaction Monitoring (MRM) Method Development for Peptide Drugs Using waters_connect for Quantitation Software

Applications
| 2025 | Waters
Instrumentation
Software, LC/MS, LC/MS/MS, LC/QQQ
Manufacturer
Waters
Industries
Pharma & Biopharma

Constant Neutral Loss, Precursor Ion Scanning, Product Ion Scanning in Support of DMPK Studies Using waters_connect for Quantitation Software Solution and Xevo TQ Absolute XR Mass Spectrometer

Applications
| 2025 | Waters
Instrumentation
LC/MS, LC/MS/MS, LC/QQQ, Software
Manufacturer
Waters
Industries
Pharma & Biopharma, Clinical Research, Metabolomics

pH gradient Analysis of Infliximab charge variant

Applications
| 2025 | Shimadzu
Instrumentation
HPLC, Consumables, LC columns
Manufacturer
Shimadzu
Industries
Pharma & Biopharma
 

Related articles

How to recalibrate a level in the calibration in Clarity Chromatography Software
Article | Product

How to recalibrate a level in the calibration in Clarity Chromatography Software

Learn how to correct faulty calibration levels in Clarity software. This guide explains when to remove a point, when to recalibrate, and how to keep calibration curves accurate and reliable.
DataApex
tag
share
more
Work Smarter, Review Faster with Chromeleon CDS
Article | Product

Work Smarter, Review Faster with Chromeleon CDS

Discover how Chromeleon CDS View Settings streamline data processing and review, improve SOP compliance, reduce errors, and boost productivity across analytical labs.
Thermo Fisher Scientific
tag
share
more
What is the most appropriate way to approach scaling up your analytical method when transferring it to preparative chromatography scale? Part 1
Article | Academy

What is the most appropriate way to approach scaling up your analytical method when transferring it to preparative chromatography scale? Part 1

Learn how to balance purity, yield, and throughput when scaling analytical HPLC methods to preparative chromatography.
Phenomenex
tag
share
more
News from LabRulezLCMS Library - Week 47, 2025
Article | Application

News from LabRulezLCMS Library - Week 47, 2025

This week we bring you application notes by Agilent Technologies, Shimadzu and Waters Corporation and poster by Thermo Fisher Scientific / HPLC!
LabRulez
tag
share
more
Other projects
GCMS
ICPMS
Follow us
More information
WebinarsAbout usContact usTerms of use
LabRulez s.r.o. All rights reserved. Content available under a CC BY-SA 4.0 Attribution-ShareAlike