Related papers: High-Dimensional Partial Least Squares: Spectral Analysis and Fundamental Limitations

High-Dimensional Partial Least Squares: Spectral Analysis and Fundamental Limitations

URL: http://arxiv.org/abs/2512.15684v1
Date: Wed, 17 Dec 2025 18:38:01 GMT
Title: High-Dimensional Partial Least Squares: Spectral Analysis and Fundamental Limitations
Authors: Victor Léger, Florent Chatelain,
Abstract summary: Partial Least Squares (PLS) is a widely used method for data integration, designed to extract latent components shared across paired high-dimensional datasets.<n>Despite decades of practical success, a precise theoretical understanding of its behavior in high-dimensional regimes remains limited.<n>Our results offer a comprehensive theoretical understanding of high-dimensional PLS-SVD, clarifying both its advantages and fundamental limitations.
Score: 2.9793019246605676
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Partial Least Squares (PLS) is a widely used method for data integration, designed to extract latent components shared across paired high-dimensional datasets. Despite decades of practical success, a precise theoretical understanding of its behavior in high-dimensional regimes remains limited. In this paper, we study a data integration model in which two high-dimensional data matrices share a low-rank common latent structure while also containing individual-specific components. We analyze the singular vectors of the associated cross-covariance matrix using tools from random matrix theory and derive asymptotic characterizations of the alignment between estimated and true latent directions. These results provide a quantitative explanation of the reconstruction performance of the PLS variant based on Singular Value Decomposition (PLS-SVD) and identify regimes where the method exhibits counter-intuitive or limiting behavior. Building on this analysis, we compare PLS-SVD with principal component analysis applied separately to each dataset and show its asymptotic superiority in detecting the common latent subspace. Overall, our results offer a comprehensive theoretical understanding of high-dimensional PLS-SVD, clarifying both its advantages and fundamental limitations.

Related papers

Stacked SVD or SVD stacked? A Random Matrix Theory perspective on data integration [7.304283080560899]
Modern data analysis increasingly requires identifying shared latent structure across multiple high-dimensional datasets.<n>Two primary methods have emerged for estimating this shared structure, which vary in how they integrate information across datasets.<n>We develop exact expressions for the performance and phase transitions of these two methods and develop optimal weighting schemes to further improve both methods.
arXiv Detail & Related papers (2025-07-29T19:03:01Z)
Optimal Estimation of Shared Singular Subspaces across Multiple Noisy Matrices [3.3373545585860596]
This study focuses on estimating shared (left) singular subspaces across multiple matrices within a low-rank matrix denoising framework. We establish that Stack-SVD achieves minimax rate-optimality when the true singular subspaces of the signal matrices are identical. For various cases of partial sharing, we rigorously characterize the conditions under which Stack-SVD remains effective, achieves minimax optimality, or fails to deliver consistent estimates.
arXiv Detail & Related papers (2024-11-26T02:49:30Z)
Entropic Optimal Transport Eigenmaps for Nonlinear Alignment and Joint Embedding of High-Dimensional Datasets [11.105392318582677]
We propose a principled approach for aligning and jointly embedding a pair of datasets with theoretical guarantees. Our approach leverages the leading singular vectors of the EOT plan matrix between two datasets to extract their shared underlying structure. We show that in a high-dimensional regime, the EOT plan recovers the shared manifold structure by approximating a kernel function evaluated at the locations of the latent variables.
arXiv Detail & Related papers (2024-07-01T18:48:55Z)
Kernel spectral joint embeddings for high-dimensional noisy datasets using duo-landmark integral operators [9.782959684053631]
We propose a novel kernel spectral method that achieves joint embeddings of two independently observed high-dimensional noisy datasets.<n>The obtained low-dimensional embeddings can be utilized for many downstream tasks such as simultaneous clustering, data visualization, and denoising.
arXiv Detail & Related papers (2024-05-20T18:29:36Z)
Synergistic eigenanalysis of covariance and Hessian matrices for enhanced binary classification [72.77513633290056]
We present a novel approach that combines the eigenanalysis of a covariance matrix evaluated on a training set with a Hessian matrix evaluated on a deep learning model. Our method captures intricate patterns and relationships, enhancing classification performance.
arXiv Detail & Related papers (2024-02-14T16:10:42Z)
Hodge-Aware Contrastive Learning [101.56637264703058]
Simplicial complexes prove effective in modeling data with multiway dependencies. We develop a contrastive self-supervised learning approach for processing simplicial data.
arXiv Detail & Related papers (2023-09-14T00:40:07Z)
coVariance Neural Networks [119.45320143101381]
Graph neural networks (GNN) are an effective framework that exploit inter-relationships within graph-structured data for learning. We propose a GNN architecture, called coVariance neural network (VNN), that operates on sample covariance matrices as graphs. We show that VNN performance is indeed more stable than PCA-based statistical approaches.
arXiv Detail & Related papers (2022-05-31T15:04:43Z)
Reinforcement Learning from Partial Observation: Linear Function Approximation with Provable Sample Efficiency [111.83670279016599]
We study reinforcement learning for partially observed decision processes (POMDPs) with infinite observation and state spaces. We make the first attempt at partial observability and function approximation for a class of POMDPs with a linear structure.
arXiv Detail & Related papers (2022-04-20T21:15:38Z)
Effective Data-aware Covariance Estimator from Compressed Data [63.16042585506435]
We propose a data-aware weighted sampling based covariance matrix estimator, namely DACE, which can provide an unbiased covariance matrix estimation. We conduct extensive experiments on both synthetic and real-world datasets to demonstrate the superior performance of our DACE.
arXiv Detail & Related papers (2020-10-10T10:10:28Z)
Asymptotic Analysis of an Ensemble of Randomly Projected Linear Discriminants [94.46276668068327]
In [1], an ensemble of randomly projected linear discriminants is used to classify datasets. We develop a consistent estimator of the misclassification probability as an alternative to the computationally-costly cross-validation estimator. We also demonstrate the use of our estimator for tuning the projection dimension on both real and synthetic data.
arXiv Detail & Related papers (2020-04-17T12:47:04Z)
D-GCCA: Decomposition-based Generalized Canonical Correlation Analysis for Multi-view High-dimensional Data [11.184915338554422]
A popular model in high-dimensional multi-view data analysis decomposes each view's data matrix into a low-rank common-source matrix generated by latent factors common across all data views. We propose a novel decomposition method for this model, called decomposition-based generalized canonical correlation analysis (D-GCCA) Our D-GCCA takes one step further than generalized canonical correlation analysis by separating common and distinctive components among canonical variables.
arXiv Detail & Related papers (2020-01-09T06:35:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.