Comparison of machine learning approaches for the classification of elution profiles

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningfagfællebedømt

Standard

Comparison of machine learning approaches for the classification of elution profiles. / Baccolo, Giacomo; Yu, Huiwen; Valsecchi, Cecile; Ballabio, Davide; Bro, Rasmus.

I: Chemometrics and Intelligent Laboratory Systems, Bind 243, 105002, 2023.

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningfagfællebedømt

Harvard

Baccolo, G, Yu, H, Valsecchi, C, Ballabio, D & Bro, R 2023, 'Comparison of machine learning approaches for the classification of elution profiles', Chemometrics and Intelligent Laboratory Systems, bind 243, 105002. https://doi.org/10.1016/j.chemolab.2023.105002

APA

Baccolo, G., Yu, H., Valsecchi, C., Ballabio, D., & Bro, R. (2023). Comparison of machine learning approaches for the classification of elution profiles. Chemometrics and Intelligent Laboratory Systems, 243, [105002]. https://doi.org/10.1016/j.chemolab.2023.105002

Vancouver

Baccolo G, Yu H, Valsecchi C, Ballabio D, Bro R. Comparison of machine learning approaches for the classification of elution profiles. Chemometrics and Intelligent Laboratory Systems. 2023;243. 105002. https://doi.org/10.1016/j.chemolab.2023.105002

Author

Baccolo, Giacomo ; Yu, Huiwen ; Valsecchi, Cecile ; Ballabio, Davide ; Bro, Rasmus. / Comparison of machine learning approaches for the classification of elution profiles. I: Chemometrics and Intelligent Laboratory Systems. 2023 ; Bind 243.

Bibtex

@article{c510d7d2019e43a4b6968c0c42c4c93e,
title = "Comparison of machine learning approaches for the classification of elution profiles",
abstract = "Hyphenated chromatography is among the most popular analytical techniques in omics related research. While great advancements have been achieved on the experimental side, the same is not true for the extraction of the relevant information from chromatographic data. Extensive signal preprocessing is required to remove the signal of the baseline, resolve the time shifts of peaks from sample to sample and to properly estimate the spectra and concentrations of co-eluting compounds. Among several available strategies, curve resolution approaches, such as PARAFAC2, ease the deconvolution and the quantification of chemicals. However, not all resolved profiles are relevant. For example, some take into account the baseline, others the chemical compounds. Thus, it is necessary to distinguish the profiles describing relevant chemistry. With the aim to assist researchers in this selection phase, we have tried three different classification algorithms (convolutional and recurrent neural networks, k-nearest neighbours) for the automatic identification of GC-MS elution profiles resolved by PARAFAC2. To this end, we have manually labelled more than 170,000 elution profiles in the following four classes: {\textquoteleft}Peak{\textquoteright}, {\textquoteleft}Cutoff peak{\textquoteright},{\textquoteright} Baseline{\textquoteright} and {\textquoteleft}Others{\textquoteright} in order to train, validate and test the classification models. The results highlight two main points: i) neural networks seem to be the best solution for this specific classification task confirmed by the overall quality of the classification, ii) the quality of the input data is crucial to maximize the modelling performances.",
keywords = "Automatic analysis, Chromatography, Neural networks, PARAFAC2",
author = "Giacomo Baccolo and Huiwen Yu and Cecile Valsecchi and Davide Ballabio and Rasmus Bro",
note = "Publisher Copyright: {\textcopyright} 2023 The Authors",
year = "2023",
doi = "10.1016/j.chemolab.2023.105002",
language = "English",
volume = "243",
journal = "Chemometrics and Intelligent Laboratory Systems",
issn = "0169-7439",
publisher = "Elsevier",

}

RIS

TY - JOUR

T1 - Comparison of machine learning approaches for the classification of elution profiles

AU - Baccolo, Giacomo

AU - Yu, Huiwen

AU - Valsecchi, Cecile

AU - Ballabio, Davide

AU - Bro, Rasmus

N1 - Publisher Copyright: © 2023 The Authors

PY - 2023

Y1 - 2023

N2 - Hyphenated chromatography is among the most popular analytical techniques in omics related research. While great advancements have been achieved on the experimental side, the same is not true for the extraction of the relevant information from chromatographic data. Extensive signal preprocessing is required to remove the signal of the baseline, resolve the time shifts of peaks from sample to sample and to properly estimate the spectra and concentrations of co-eluting compounds. Among several available strategies, curve resolution approaches, such as PARAFAC2, ease the deconvolution and the quantification of chemicals. However, not all resolved profiles are relevant. For example, some take into account the baseline, others the chemical compounds. Thus, it is necessary to distinguish the profiles describing relevant chemistry. With the aim to assist researchers in this selection phase, we have tried three different classification algorithms (convolutional and recurrent neural networks, k-nearest neighbours) for the automatic identification of GC-MS elution profiles resolved by PARAFAC2. To this end, we have manually labelled more than 170,000 elution profiles in the following four classes: ‘Peak’, ‘Cutoff peak’,’ Baseline’ and ‘Others’ in order to train, validate and test the classification models. The results highlight two main points: i) neural networks seem to be the best solution for this specific classification task confirmed by the overall quality of the classification, ii) the quality of the input data is crucial to maximize the modelling performances.

AB - Hyphenated chromatography is among the most popular analytical techniques in omics related research. While great advancements have been achieved on the experimental side, the same is not true for the extraction of the relevant information from chromatographic data. Extensive signal preprocessing is required to remove the signal of the baseline, resolve the time shifts of peaks from sample to sample and to properly estimate the spectra and concentrations of co-eluting compounds. Among several available strategies, curve resolution approaches, such as PARAFAC2, ease the deconvolution and the quantification of chemicals. However, not all resolved profiles are relevant. For example, some take into account the baseline, others the chemical compounds. Thus, it is necessary to distinguish the profiles describing relevant chemistry. With the aim to assist researchers in this selection phase, we have tried three different classification algorithms (convolutional and recurrent neural networks, k-nearest neighbours) for the automatic identification of GC-MS elution profiles resolved by PARAFAC2. To this end, we have manually labelled more than 170,000 elution profiles in the following four classes: ‘Peak’, ‘Cutoff peak’,’ Baseline’ and ‘Others’ in order to train, validate and test the classification models. The results highlight two main points: i) neural networks seem to be the best solution for this specific classification task confirmed by the overall quality of the classification, ii) the quality of the input data is crucial to maximize the modelling performances.

KW - Automatic analysis

KW - Chromatography

KW - Neural networks

KW - PARAFAC2

U2 - 10.1016/j.chemolab.2023.105002

DO - 10.1016/j.chemolab.2023.105002

M3 - Journal article

AN - SCOPUS:85175461308

VL - 243

JO - Chemometrics and Intelligent Laboratory Systems

JF - Chemometrics and Intelligent Laboratory Systems

SN - 0169-7439

M1 - 105002

ER -

ID: 372829500