Rethinking calibration as a statistical estimation problem to improve measurement accuracy

Scientific articles | 2025 | GMAS Laboratory | Analytica Chimica ActaInstrumentation

Software

Industries

Other

Manufacturer

Summary

Importance of the topic

The paper addresses a pervasive and critical issue in analytical chemistry: the accuracy and consistency of calibration-based quantification. Calibration curves underpin more than 90% of routine chemical measurements; however, small calibration sample sizes, unacknowledged trade-offs between bias and variance, and neglect of information available across samples or repeated tests can degrade the accuracy of individual concentration estimates. Reframing calibration as a statistical estimation (missing-data) problem enables use of Bayesian hierarchical models (BHM) to intentionally introduce controlled shrinkage and pool information, thereby improving practical measurement accuracy without altering laboratory protocols.

Objectives and overview of the study

The authors aim to (1) identify statistical weaknesses of conventional calibration (ordinary least squares and inverse estimation), (2) present BHM as a robust alternative that pools information within a test and across repeated calibration tests, (3) demonstrate BHM performance on three representative calibration scenarios (nonlinear ELISA, linear colorimetric phosphate assay, and LCMS SPME matrix-effect study), and (4) provide computational pathways for practical implementation including sequential updating for routine labs.

Methodology

The study reframes calibration as estimation of missing predictor variables (unknown concentrations) jointly with calibration curve parameters. Major methodological points:

Classical approach: two-step ordinary least squares (OLS) regression of standards then inverse estimation of sample concentrations; theoretical sampling distributions exist only for simple linear cases and can be unstable when degrees of freedom are small.
Bayesian calibration: compute posterior distributions for calibration parameters and unknown concentrations using Markov chain Monte Carlo (MCMC); can use weakly informative or informative priors to stabilize estimation.
Bayesian hierarchical modeling (BHM): two-level pooling—within-test pooling treats multiple unknown samples in the same batch as exchangeable (log-concentration common prior), and across-test pooling treats repeated calibration-curve coefficients as exchangeable with a hyper-distribution, enabling shrinkage of coefficients toward a shared mean.
Performance evaluation: compare inverse-function (Monte Carlo simulated uncertainty), non-hierarchical Bayesian, within-test BHM, and within+across-test BHM. Accuracy assessed against QA samples (absolute errors, credible intervals) in three applied examples.

Instrumental and computational tools used

Analytical assays: ELISA kits (Eurofins-Abraxis; 4-parameter logistic calibration), ammonium molybdate/ascorbic acid colorimetric phosphate assay (SM 4500-PE), SPME coupled to LC–MS for xenobiotics in plasma.
Hardware noted: PerkinElmer QSight 220 (used for LCMS work in the related laboratory work).
Computation: R (including rv package), Stan via rstan for Bayesian MCMC. All data and commented R code provided in the authors' GitHub repository.

Main results and discussion

Sample size effect: small numbers of effective calibration points (low degrees of freedom) cause large uncertainty in residual variance and strongly inflate coefficient uncertainty. Using averaged relative responses (reducing n) can misleadingly improve R2 yet dramatically worsen uncertainty for inverse estimates.
Nonlinear ELISA example (Toledo microcystin crisis): fitting with n=5 relative points produced highly uncertain posterior distributions for 4PL parameters and much larger posterior SDs for estimated concentrations compared to n=12 raw points. BHM methods substantially reduced both bias and variability of QA concentration estimates.
Linear phosphate (PO4) example: absence of replicates leads OLS-based inverse estimates to underrepresent predictive uncertainty; within-test BHM reduced bias and across-test pooling yielded further, though smaller, improvements.
Matrix-effects LCMS example: log–log calibration (power-law) was appropriate. BHM across different matrix media showed low uncertainty in estimated coefficients, supporting feasibility of using non-human plasma (e.g., bovine) or PBS for standards when matrix effects are stable.
Shrinkage rationale: pooling information induces a deliberate bias toward group means but reduces variance; overall accuracy (absolute error for single estimates) improves by the bias–variance trade-off (Stein’s paradox, James–Stein empirical Bayes concepts).
Practical computation: the authors show sequential updating of hyper-distributions allows a laboratory to accumulate information over time without refitting massive hierarchical models from scratch; this can be embedded in a browser-based app.

Key benefits and practical implications

Rethinking calibration as a statistical estimation problem to improve measurement accuracy

Summary

Importance of the topic

Objectives and overview of the study

Methodology

Instrumental and computational tools used

Main results and discussion

Key benefits and practical implications

Future trends and potential applications

Conclusions

Reference

Similar PDF

Key words

Key words

Key words

Key words