All sparse PCA models are wrong, but some are useful - Food analytics and biotechnology

All sparse PCA models are wrong, but some are useful: Part II: Limitations and problems of deflation

Research output: Contribution to journal › Journal article › Research › peer-review

J. Camacho
A. K. Smilde
E. Saccenti
J. A. Westerhuis
Bro, Rasmus

Sparse Principal Component Analysis (sPCA) is a popular matrix factorization approach based on Principal Component Analysis (PCA). It combines variance maximization and sparsity with the ultimate goal of improving data interpretation. A main application of sPCA is to handle high-dimensional data, for example biological omics data. In Part I of this series, we illustrated limitations of several state-of-the-art sPCA algorithms when modeling noise-free data, simulated following an exact sPCA model. In this Part II we provide a thorough analysis of the limitations of sPCA methods that use deflation for calculating subsequent, higher order, components. We show, both theoretically and numerically, that deflation can lead to problems in the model interpretation, even for noise free data. In addition, we contribute diagnostics to identify modeling problems in real-data analysis.

Original language	English
Article number	104212
Journal	Chemometrics and Intelligent Laboratory Systems
Volume	208
Number of pages	11
ISSN	0169-7439
DOIs	https://doi.org/10.1016/j.chemolab.2020.104212
Publication status	Published - 2021

Research areas

Artifacts, Data interpretation, Exploratory data analysis, Model interpretation, Sparse principal component analysis, Sparsity

ID: 254720978

Department of Food Science

All sparse PCA models are wrong, but some are useful: Part II: Limitations and problems of deflation

Research areas