Unbiased prediction errors for partial least squares regression models: Choosing a representative error estimator for process monitoring

Research output: Contribution to journalJournal articleResearchpeer-review

Standard

Unbiased prediction errors for partial least squares regression models : Choosing a representative error estimator for process monitoring. / Skou, Peter B; Tonolini, Margherita; Eskildsen, Carl Emil; Berg, Frans Van Den; Rasmussen, Morten Arendt.

In: Journal of Near Infrared Spectroscopy, Vol. 31, No. 4, 2023, p. 186-195.

Research output: Contribution to journalJournal articleResearchpeer-review

Harvard

Skou, PB, Tonolini, M, Eskildsen, CE, Berg, FVD & Rasmussen, MA 2023, 'Unbiased prediction errors for partial least squares regression models: Choosing a representative error estimator for process monitoring', Journal of Near Infrared Spectroscopy, vol. 31, no. 4, pp. 186-195. https://doi.org/10.1177/09670335231173139

APA

Skou, P. B., Tonolini, M., Eskildsen, C. E., Berg, F. V. D., & Rasmussen, M. A. (2023). Unbiased prediction errors for partial least squares regression models: Choosing a representative error estimator for process monitoring. Journal of Near Infrared Spectroscopy, 31(4), 186-195. https://doi.org/10.1177/09670335231173139

Vancouver

Skou PB, Tonolini M, Eskildsen CE, Berg FVD, Rasmussen MA. Unbiased prediction errors for partial least squares regression models: Choosing a representative error estimator for process monitoring. Journal of Near Infrared Spectroscopy. 2023;31(4):186-195. https://doi.org/10.1177/09670335231173139

Author

Skou, Peter B ; Tonolini, Margherita ; Eskildsen, Carl Emil ; Berg, Frans Van Den ; Rasmussen, Morten Arendt. / Unbiased prediction errors for partial least squares regression models : Choosing a representative error estimator for process monitoring. In: Journal of Near Infrared Spectroscopy. 2023 ; Vol. 31, No. 4. pp. 186-195.

Bibtex

@article{76c5df9da75d4b2392189817d172c456,
title = "Unbiased prediction errors for partial least squares regression models: Choosing a representative error estimator for process monitoring",
abstract = "Partial least squares (PLS) regression is widely used to predict chemical analytes from spectroscopic data, thus reducing the need for expensive and time-consuming wet chemical reference analysis in industrial process monitoring. However, predictions via PLS by definition carry sample-specific errors, and estimation of these errors is essential for correct interpretation of results. To increase trust in PLS regression-based predictions, reliable prediction error estimates must be reported. This can be achieved by determining realistic sample-specific prediction errors using an unbiased mean squared prediction error estimate. This work provides a guide for estimating sample-specific prediction errors, showing the importance of choosing an appropriate error estimator prior to deploying PLS models for industrial applications. We reviewed recent and established methods for estimating the sample-specific prediction error and test them through simulation studies. The methods were subsequently applied for estimating prediction errors in two real-life datasets from the food ingredients industry, where near-infrared spectroscopy was used to quantify i) urea in process water and ii) individual protein concentrations in ultrafiltration retentates from a protein fractionation process. Both the simulations and real data examples showed that the mean squared error of calibration is always a downward biased estimator. Although leave-one-out-cross-validation performed surprisingly well in the data analysed in this work, this paper demonstrated that the appropriate choice of error estimator requires the user to make an informed, data-centered decision.",
author = "Skou, {Peter B} and Margherita Tonolini and Eskildsen, {Carl Emil} and Berg, {Frans Van Den} and Rasmussen, {Morten Arendt}",
year = "2023",
doi = "10.1177/09670335231173139",
language = "English",
volume = "31",
pages = "186--195",
journal = "Journal of Near Infrared Spectroscopy",
issn = "0967-0335",
publisher = "N I R Publications",
number = "4",

}

RIS

TY - JOUR

T1 - Unbiased prediction errors for partial least squares regression models

T2 - Choosing a representative error estimator for process monitoring

AU - Skou, Peter B

AU - Tonolini, Margherita

AU - Eskildsen, Carl Emil

AU - Berg, Frans Van Den

AU - Rasmussen, Morten Arendt

PY - 2023

Y1 - 2023

N2 - Partial least squares (PLS) regression is widely used to predict chemical analytes from spectroscopic data, thus reducing the need for expensive and time-consuming wet chemical reference analysis in industrial process monitoring. However, predictions via PLS by definition carry sample-specific errors, and estimation of these errors is essential for correct interpretation of results. To increase trust in PLS regression-based predictions, reliable prediction error estimates must be reported. This can be achieved by determining realistic sample-specific prediction errors using an unbiased mean squared prediction error estimate. This work provides a guide for estimating sample-specific prediction errors, showing the importance of choosing an appropriate error estimator prior to deploying PLS models for industrial applications. We reviewed recent and established methods for estimating the sample-specific prediction error and test them through simulation studies. The methods were subsequently applied for estimating prediction errors in two real-life datasets from the food ingredients industry, where near-infrared spectroscopy was used to quantify i) urea in process water and ii) individual protein concentrations in ultrafiltration retentates from a protein fractionation process. Both the simulations and real data examples showed that the mean squared error of calibration is always a downward biased estimator. Although leave-one-out-cross-validation performed surprisingly well in the data analysed in this work, this paper demonstrated that the appropriate choice of error estimator requires the user to make an informed, data-centered decision.

AB - Partial least squares (PLS) regression is widely used to predict chemical analytes from spectroscopic data, thus reducing the need for expensive and time-consuming wet chemical reference analysis in industrial process monitoring. However, predictions via PLS by definition carry sample-specific errors, and estimation of these errors is essential for correct interpretation of results. To increase trust in PLS regression-based predictions, reliable prediction error estimates must be reported. This can be achieved by determining realistic sample-specific prediction errors using an unbiased mean squared prediction error estimate. This work provides a guide for estimating sample-specific prediction errors, showing the importance of choosing an appropriate error estimator prior to deploying PLS models for industrial applications. We reviewed recent and established methods for estimating the sample-specific prediction error and test them through simulation studies. The methods were subsequently applied for estimating prediction errors in two real-life datasets from the food ingredients industry, where near-infrared spectroscopy was used to quantify i) urea in process water and ii) individual protein concentrations in ultrafiltration retentates from a protein fractionation process. Both the simulations and real data examples showed that the mean squared error of calibration is always a downward biased estimator. Although leave-one-out-cross-validation performed surprisingly well in the data analysed in this work, this paper demonstrated that the appropriate choice of error estimator requires the user to make an informed, data-centered decision.

U2 - 10.1177/09670335231173139

DO - 10.1177/09670335231173139

M3 - Journal article

VL - 31

SP - 186

EP - 195

JO - Journal of Near Infrared Spectroscopy

JF - Journal of Near Infrared Spectroscopy

SN - 0967-0335

IS - 4

ER -

ID: 357525504