Making sense of multiple distance matrices through common and distinct components
Publikation: Bidrag til tidsskrift › Tidsskriftartikel › Forskning › fagfællebedømt
Standard
Making sense of multiple distance matrices through common and distinct components. / Solberg, Lars Erik; Dahl, Tobias; Naes, Tormod.
I: Journal of Chemometrics, Bind 35, Nr. 11, 3372, 2021.Publikation: Bidrag til tidsskrift › Tidsskriftartikel › Forskning › fagfællebedømt
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - JOUR
T1 - Making sense of multiple distance matrices through common and distinct components
AU - Solberg, Lars Erik
AU - Dahl, Tobias
AU - Naes, Tormod
PY - 2021
Y1 - 2021
N2 - Multiblock analysis attacks the problem of how to combine data from various data sources for purposes such as prediction, classification, clustering, or visual data analysis. A key concept is the distinction between “common” and “distinct” parts, that is, what information repeats itself across the blocks and what is unique to an individual block.The statistical field of multiblock analysis holds many different approaches, which leads to different treatments both of the terms distinct and common themselves and to differences in the numerical results. In this article, we extend the discussion of distinct and common in multiblock analysis to the domain of distance matrices, that is, the situation where data point sets, so-called configurations, are analyzed via relative distances either because configurations are not available directly or because a distance representation is favorable. Situations typical for chemometrics will be highlighted and illustrated in examples.When analyzing different methods, we have focused on three key aspects. First, during the transition from the distance to configuration domains, one needs to consider how multiple distance matrices are treated. Second, when extracting common and distinct parts, one needs to manage a tradeoff between explaining variance and ensuring similarity between subspaces. Third, there is a design choice to be made as to whether the subspace containing the common parts is “shared” between blocks or if separate subspaces are associated with each individual block. The three aspects help to categorize and explain well-known methods in the field. A selection of methods was analyzed and subsequently applied to examples.
AB - Multiblock analysis attacks the problem of how to combine data from various data sources for purposes such as prediction, classification, clustering, or visual data analysis. A key concept is the distinction between “common” and “distinct” parts, that is, what information repeats itself across the blocks and what is unique to an individual block.The statistical field of multiblock analysis holds many different approaches, which leads to different treatments both of the terms distinct and common themselves and to differences in the numerical results. In this article, we extend the discussion of distinct and common in multiblock analysis to the domain of distance matrices, that is, the situation where data point sets, so-called configurations, are analyzed via relative distances either because configurations are not available directly or because a distance representation is favorable. Situations typical for chemometrics will be highlighted and illustrated in examples.When analyzing different methods, we have focused on three key aspects. First, during the transition from the distance to configuration domains, one needs to consider how multiple distance matrices are treated. Second, when extracting common and distinct parts, one needs to manage a tradeoff between explaining variance and ensuring similarity between subspaces. Third, there is a design choice to be made as to whether the subspace containing the common parts is “shared” between blocks or if separate subspaces are associated with each individual block. The three aspects help to categorize and explain well-known methods in the field. A selection of methods was analyzed and subsequently applied to examples.
KW - common
KW - consensus
KW - distances
KW - distinct
KW - multiblock
KW - multidimensional scaling
U2 - 10.1002/cem.3372
DO - 10.1002/cem.3372
M3 - Journal article
VL - 35
JO - Journal of Chemometrics
JF - Journal of Chemometrics
SN - 0886-9383
IS - 11
M1 - 3372
ER -
ID: 285870320