Related papers: Spectral Thresholds in Correlated Spiked Models and Fundamental Limits of Partial Least Squares

Spectral Thresholds in Correlated Spiked Models and Fundamental Limits of Partial Least Squares

URL: http://arxiv.org/abs/2510.17561v1
Date: Mon, 20 Oct 2025 14:08:58 GMT
Title: Spectral Thresholds in Correlated Spiked Models and Fundamental Limits of Partial Least Squares
Authors: Pierre Mergny, Lenka Zdeborová,
Abstract summary: We show that Partial Least Squares (PLS) fails to recover any signal, despite detectability being possible in principle.<n>These findings clarify the theoretical limits of PLS and provide guidance for the design of reliable multi-modal inference methods in high dimensions.
Score: 15.163541835643635
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We provide a rigorous random matrix theory analysis of spiked cross-covariance models where the signals across two high-dimensional data channels are partially aligned. These models are motivated by multi-modal learning and form the standard generative setting underlying Partial Least Squares (PLS), a widely used yet theoretically underdeveloped method. We show that the leading singular values of the sample cross-covariance matrix undergo a Baik-Ben Arous-Peche (BBP)-type phase transition, and we characterize the precise thresholds for the emergence of informative components. Our results yield the first sharp asymptotic description of the signal recovery capabilities of PLS in this setting, revealing a fundamental performance gap between PLS and the Bayes-optimal estimator. In particular, we identify the SNR and correlation regimes where PLS fails to recover any signal, despite detectability being possible in principle. These findings clarify the theoretical limits of PLS and provide guidance for the design of reliable multi-modal inference methods in high dimensions.

Related papers

Learnable Chernoff Baselines for Inference-Time Alignment [64.81256817158851]
We introduce Learnable Chernoff Baselines as a method for efficiently and approximately sampling from exponentially tilted kernels.<n>We establish total-variation guarantees to the ideal aligned model, and demonstrate in both continuous and discrete diffusion settings that LCB sampling closely matches ideal rejection sampling.
arXiv Detail & Related papers (2026-02-08T00:09:40Z)
Learning to Separate RF Signals Under Uncertainty: Detect-Then-Separate vs. Unified Joint Models [53.79667447811139]
We show that a single deep neural architecture learns to jointly detect and separate when applied directly to the received signal.<n>These findings highlight UJM as a scalable and practical alternative to DTS, while opening new directions for unified separation under broader estimation.
arXiv Detail & Related papers (2026-02-04T15:25:02Z)
Theoretical Bounds for Stable In-Context Learning [0.0]
In-context learning (ICL) is flexible but its reliability is sensitive to prompt length.<n>This paper establishes a non-asymptotic lower bound that links the minimal number of demonstrations to ICL stability.<n>We propose a two-stage observable estimator with a one-shot calibration that produces practitioner-ready prompt-length estimates.
arXiv Detail & Related papers (2025-09-25T02:25:05Z)
Revisit CP Tensor Decomposition: Statistical Optimality and Fast Convergence [6.724750970258851]
We revisit Canonical Polyadic (CP) tensor decomposition from a statistical perspective.<n>We provide a comprehensive theoretical analysis of Alternating Least Squares (ALS) under a signal-plus-noise model.
arXiv Detail & Related papers (2025-05-29T03:42:03Z)
Unveiling the Statistical Foundations of Chain-of-Thought Prompting Methods [59.779795063072655]
Chain-of-Thought (CoT) prompting and its variants have gained popularity as effective methods for solving multi-step reasoning problems. We analyze CoT prompting from a statistical estimation perspective, providing a comprehensive characterization of its sample complexity.
arXiv Detail & Related papers (2024-08-25T04:07:18Z)
Optimal thresholds and algorithms for a model of multi-modal learning in high dimensions [13.531678315783195]
The paper derives the approximate message passing (AMP) algorithm for this model and characterizes its performance in the high-dimensional limit via the associated state evolution.<n>The linearization of AMP is compared numerically to the widely used partial least squares (PLS) and canonical correlation analysis (CCA) methods, which are both observed to suffer from a sub-optimal recovery threshold.
arXiv Detail & Related papers (2024-07-03T21:48:23Z)
Ensemble Kalman Filtering Meets Gaussian Process SSM for Non-Mean-Field and Online Inference [47.460898983429374]
We introduce an ensemble Kalman filter (EnKF) into the non-mean-field (NMF) variational inference framework to approximate the posterior distribution of the latent states. This novel marriage between EnKF and GPSSM not only eliminates the need for extensive parameterization in learning variational distributions, but also enables an interpretable, closed-form approximation of the evidence lower bound (ELBO) We demonstrate that the resulting EnKF-aided online algorithm embodies a principled objective function by ensuring data-fitting accuracy while incorporating model regularizations to mitigate overfitting.
arXiv Detail & Related papers (2023-12-10T15:22:30Z)
GEC: A Unified Framework for Interactive Decision Making in MDP, POMDP, and Beyond [101.5329678997916]
We study sample efficient reinforcement learning (RL) under the general framework of interactive decision making. We propose a novel complexity measure, generalized eluder coefficient (GEC), which characterizes the fundamental tradeoff between exploration and exploitation. We show that RL problems with low GEC form a remarkably rich class, which subsumes low Bellman eluder dimension problems, bilinear class, low witness rank problems, PO-bilinear class, and generalized regular PSR.
arXiv Detail & Related papers (2022-11-03T16:42:40Z)
Contrastive and Non-Contrastive Self-Supervised Learning Recover Global and Local Spectral Embedding Methods [19.587273175563745]
Self-Supervised Learning (SSL) surmises that inputs and pairwise positive relationships are enough to learn meaningful representations. This paper proposes a unifying framework under the helm of spectral manifold learning to address those limitations.
arXiv Detail & Related papers (2022-05-23T17:59:32Z)
Amortized Conditional Normalized Maximum Likelihood: Reliable Out of Distribution Uncertainty Estimation [99.92568326314667]
We propose the amortized conditional normalized maximum likelihood (ACNML) method as a scalable general-purpose approach for uncertainty estimation. Our algorithm builds on the conditional normalized maximum likelihood (CNML) coding scheme, which has minimax optimal properties according to the minimum description length principle. We demonstrate that ACNML compares favorably to a number of prior techniques for uncertainty estimation in terms of calibration on out-of-distribution inputs.
arXiv Detail & Related papers (2020-11-05T08:04:34Z)
Understanding Implicit Regularization in Over-Parameterized Single Index Model [55.41685740015095]
We design regularization-free algorithms for the high-dimensional single index model. We provide theoretical guarantees for the induced implicit regularization phenomenon.
arXiv Detail & Related papers (2020-07-16T13:27:47Z)
A Support Detection and Root Finding Approach for Learning High-dimensional Generalized Linear Models [10.103666349083165]
We develop a support detection and root finding procedure to learn the high dimensional sparse generalized linear models. We conduct simulations and real data analysis to illustrate the advantages of our proposed method over several existing methods.
arXiv Detail & Related papers (2020-01-16T14:35:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.