A Novel PLS2-Based Algorithm for Imputing Missing Values in Foodomics/Metabolomics Studies with multiple response variables

Research output: Contribution to conferencePosterResearchpeer-review


  • Fulltext

    Final published version, 78.5 KB, PDF document

Missing values are frequent problems in data analytics studies, especially when a calibration model should be designed [1]. This study introduces a novel method, utilizing the PLS2 algorithm [2, 3] for imputing missing values within Ultra Centrifugation (UC) measurements of lipoprotein (LP) subfractions in human plasma. LP subfractions are essential biomarkers of food-related diseases such as obesity and cardiovascular diseases. They are categorized into four main fractions based on density and size: very low-density (VLDL), intermediate density (IDL), low-density (LDL), and high-density (HDL) LPs. Our proposed algorithm leverages proton (1H) nuclear magnetic resonance (NMR) spectroscopy of LP in human blood plasma as a promising analytical method for rapid quantification of LP subfractions [5]. The robust and reliable NMR spectral data (p=1500) serves as the input X variable, while the UC variables (p=65) are used as the response variables. The UC variables are prone to measurement errors, occasionally considered as missing variables.
The proposed imputation method is iterative and consists of several stages to impute the missing values effectively. First, the samples are stratified based on the number of missing values in each sample. Subsequently, bootstrapping cross-validation is applied using PLS2 modeling. PLS2 models and their associated number of Latent Variables (LVs) with root mean squared error of cross-validation (RMSECVs) falling within a confidential interval of the RMSECV distribution, i.e., [µ-σ, µ] are extracted. Next, the weighted mean of the predicted values for the extracted PLS2 models is calculated. The imputation process then progresses to the next stratification level until all samples are included and all missing values are imputed. Finally, the whole procedure is repeated until all the imputed variables converge.
Comparative analysis reveals that the proposed PLS2-based imputation method outperforms other commonly used imputation strategies such as iterative PCA [6] and missMDA [7]. The algorithm demonstrates superior performance in accurately imputing missing values within UC measurements of LP subfractions, enabling researchers to obtain more reliable and accurate results.
In conclusion, this study highlights the significance of addressing missing values in chemometrics studies. The proposed algorithm, which combines PLS2 modeling and NMR spectroscopy, offers a robust approach for imputing missing values within UC measurements of LP subfractions. Initial results demonstrate that the method's efficacy surpasses traditional imputation strategies and may contribute to enriching and improving lipoprotein prediction models based on NMR spectroscopy.
Original languageEnglish
Publication date2023
Publication statusPublished - 2023
EventFood Analytics Conference - Copenhagen, Denmark
Duration: 15 Nov 2023 → …


ConferenceFood Analytics Conference
Period15/11/2023 → …
Internet address

Number of downloads are based on statistics from Google Scholar and www.ku.dk

No data available

ID: 375015102