Related papers: Computational Thresholds in Multi-Modal Learning via the Spiked Matrix-Tensor Model

Computational Thresholds in Multi-Modal Learning via the Spiked Matrix-Tensor Model

URL: http://arxiv.org/abs/2506.02664v1
Date: Tue, 03 Jun 2025 09:14:34 GMT
Title: Computational Thresholds in Multi-Modal Learning via the Spiked Matrix-Tensor Model
Authors: Hugo Tabanelli, Pierre Mergny, Lenka Zdeborova, Florent Krzakala,
Abstract summary: We study the recovery of multiple high-dimensional signals from two noisy, correlated modalities: a spiked matrix and a spiked tensor.<n>We show that a simple Sequential Curriculum Learning strategy-first recovering the matrix, then leveraging it to guide tensor recovery-resolves this bottleneck and achieves optimal weak recovery thresholds.
Score: 16.894374370635433
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study the recovery of multiple high-dimensional signals from two noisy, correlated modalities: a spiked matrix and a spiked tensor sharing a common low-rank structure. This setting generalizes classical spiked matrix and tensor models, unveiling intricate interactions between inference channels and surprising algorithmic behaviors. Notably, while the spiked tensor model is typically intractable at low signal-to-noise ratios, its correlation with the matrix enables efficient recovery via Bayesian Approximate Message Passing, inducing staircase-like phase transitions reminiscent of neural network phenomena. In contrast, empirical risk minimization for joint learning fails: the tensor component obstructs effective matrix recovery, and joint optimization significantly degrades performance, highlighting the limitations of naive multi-modal learning. We show that a simple Sequential Curriculum Learning strategy-first recovering the matrix, then leveraging it to guide tensor recovery-resolves this bottleneck and achieves optimal weak recovery thresholds. This strategy, implementable with spectral methods, emphasizes the critical role of structural correlation and learning order in multi-modal high-dimensional inference.

Related papers

Train with Perturbation, Infer after Merging: A Two-Stage Framework for Continual Learning [59.6658995479243]
We propose texttext-Perturb-and-Merge (P&M), a novel continual learning framework that integrates model merging into the CL paradigm to avoid forgetting.<n>Through theoretical analysis, we minimize the total loss increase across all tasks and derive an analytical solution for the optimal merging coefficient.<n>Our proposed approach achieves state-of-the-art performance on several continual learning benchmark datasets.
arXiv Detail & Related papers (2025-05-28T14:14:19Z)
In-Context Linear Regression Demystified: Training Dynamics and Mechanistic Interpretability of Multi-Head Softmax Attention [52.159541540613915]
We study how multi-head softmax attention models are trained to perform in-context learning on linear data.<n>Our results reveal that in-context learning ability emerges from the trained transformer as an aggregated effect of its architecture and the underlying data distribution.
arXiv Detail & Related papers (2025-03-17T02:00:49Z)
Optimal thresholds and algorithms for a model of multi-modal learning in high dimensions [15.000720880773548]
The paper derives the approximate message passing (AMP) algorithm for this model and characterizes its performance in the high-dimensional limit. The linearization of AMP is compared numerically to the widely used partial least squares (PLS) and canonical correlation analysis (CCA) methods.
arXiv Detail & Related papers (2024-07-03T21:48:23Z)
Interaction Screening and Pseudolikelihood Approaches for Tensor Learning in Ising Models [7.298865011539767]
We study two well known methods of Ising structure learning, namely the pseudolikelihood approach and the interaction screening approach. We show that both approaches retrieve the underlying hypernetwork structure using a sample size logarithmic in the number of network nodes.
arXiv Detail & Related papers (2023-10-20T02:42:32Z)
Fast and Provable Tensor Robust Principal Component Analysis via Scaled Gradient Descent [30.299284742925852]
This paper tackles tensor robust principal component analysis (RPCA) It aims to recover a low-rank tensor from its observations contaminated by sparse corruptions. We show that the proposed algorithm achieves better and more scalable performance than state-of-the-art matrix and tensor RPCA algorithms.
arXiv Detail & Related papers (2022-06-18T04:01:32Z)
Deep Equilibrium Assisted Block Sparse Coding of Inter-dependent Signals: Application to Hyperspectral Imaging [71.57324258813675]
A dataset of inter-dependent signals is defined as a matrix whose columns demonstrate strong dependencies. A neural network is employed to act as structure prior and reveal the underlying signal interdependencies. Deep unrolling and Deep equilibrium based algorithms are developed, forming highly interpretable and concise deep-learning-based architectures.
arXiv Detail & Related papers (2022-03-29T21:00:39Z)
MLCTR: A Fast Scalable Coupled Tensor Completion Based on Multi-Layer Non-Linear Matrix Factorization [3.6978630614152013]
This paper focuses on the embedding learning aspect of the tensor completion problem and proposes a new multi-layer neural network architecture for factorization and completion (MLCTR) The network architecture entails multiple advantages: a series of low-rank matrix factorizations building blocks to minimize overfitting, interleaved transfer functions in each layer for non-linearity, and by-pass connections to reduce diminishing problem and increase depths of networks. Our algorithm is highly efficient for imputing missing values in the EPS data.
arXiv Detail & Related papers (2021-09-04T03:08:34Z)
The Interplay Between Implicit Bias and Benign Overfitting in Two-Layer Linear Networks [51.1848572349154]
neural network models that perfectly fit noisy data can generalize well to unseen test data. We consider interpolating two-layer linear neural networks trained with gradient flow on the squared loss and derive bounds on the excess risk.
arXiv Detail & Related papers (2021-08-25T22:01:01Z)
Initialization and Regularization of Factorized Neural Layers [23.875225732697142]
We show how to initialize and regularize Factorized layers in deep nets. We show how these schemes lead to improved performance on both translation and unsupervised pre-training.
arXiv Detail & Related papers (2021-05-03T17:28:07Z)
Learning Mixtures of Low-Rank Models [89.39877968115833]
We study the problem of learning computational mixtures of low-rank models. We develop an algorithm that is guaranteed to recover the unknown matrices with near-optimal sample. In addition, the proposed algorithm is provably stable against random noise.
arXiv Detail & Related papers (2020-09-23T17:53:48Z)
Convolutional Tensor-Train LSTM for Spatio-temporal Learning [116.24172387469994]
We propose a higher-order LSTM model that can efficiently learn long-term correlations in the video sequence. This is accomplished through a novel tensor train module that performs prediction by combining convolutional features across time. Our results achieve state-of-the-art performance-art in a wide range of applications and datasets.
arXiv Detail & Related papers (2020-02-21T05:00:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.