Related papers: Linear Recursive Feature Machines provably recover low-rank matrices

Linear Recursive Feature Machines provably recover low-rank matrices

URL: http://arxiv.org/abs/2401.04553v1
Date: Tue, 9 Jan 2024 13:44:12 GMT
Title: Linear Recursive Feature Machines provably recover low-rank matrices
Authors: Adityanarayanan Radhakrishnan, Mikhail Belkin, Dmitriy Drusvyatskiy
Abstract summary: We develop the first theoretical guarantees for how RFM performs dimensionality reduction. We generalize the Iteratively Reweighted Least Squares (IRLS) algorithm. Our results shed light on the connection between feature learning in neural networks and classical sparse recovery algorithms.
Score: 17.530511273384786
License: http://creativecommons.org/licenses/by/4.0/
Abstract: A fundamental problem in machine learning is to understand how neural networks make accurate predictions, while seemingly bypassing the curse of dimensionality. A possible explanation is that common training algorithms for neural networks implicitly perform dimensionality reduction - a process called feature learning. Recent work posited that the effects of feature learning can be elicited from a classical statistical estimator called the average gradient outer product (AGOP). The authors proposed Recursive Feature Machines (RFMs) as an algorithm that explicitly performs feature learning by alternating between (1) reweighting the feature vectors by the AGOP and (2) learning the prediction function in the transformed space. In this work, we develop the first theoretical guarantees for how RFM performs dimensionality reduction by focusing on the class of overparametrized problems arising in sparse linear regression and low-rank matrix recovery. Specifically, we show that RFM restricted to linear models (lin-RFM) generalizes the well-studied Iteratively Reweighted Least Squares (IRLS) algorithm. Our results shed light on the connection between feature learning in neural networks and classical sparse recovery algorithms. In addition, we provide an implementation of lin-RFM that scales to matrices with millions of missing entries. Our implementation is faster than the standard IRLS algorithm as it is SVD-free. It also outperforms deep linear networks for sparse linear regression and low-rank matrix completion.

Related papers

Automatic debiasing of neural networks via moment-constrained learning [0.0]
Naively learning the regression function and taking a sample mean of the target functional results in biased estimators. We propose moment-constrained learning as a new RR learning approach that addresses some shortcomings in automatic debiasing.
arXiv Detail & Related papers (2024-09-29T20:56:54Z)
Learning incomplete factorization preconditioners for GMRES [1.1519724914285523]
In this paper, we develop a data-driven approach to generate incomplete LU factorizations of large-scale sparse matrices. The learned approximate factorization is utilized as a preconditioner for the corresponding linear equation system in the GMRES method. We replace the typically hand-engineered algorithms with a graph neural network based approach that is trained against data.
arXiv Detail & Related papers (2024-09-12T17:55:44Z)
Emergence in non-neural models: grokking modular arithmetic via average gradient outer product [16.911836722312152]
We show that grokking is not specific to neural networks nor to gradient descent-based optimization. We show that this phenomenon occurs when learning modular arithmetic with Recursive Feature Machines. Our results demonstrate that emergence can result purely from learning task-relevant features.
arXiv Detail & Related papers (2024-07-29T17:28:58Z)
Spectral Entry-wise Matrix Estimation for Low-Rank Reinforcement Learning [53.445068584013896]
We study matrix estimation problems arising in reinforcement learning (RL) with low-rank structure. In low-rank bandits, the matrix to be recovered specifies the expected arm rewards, and for low-rank Markov Decision Processes (MDPs), it may for example characterize the transition kernel of the MDP. We show that simple spectral-based matrix estimation approaches efficiently recover the singular subspaces of the matrix and exhibit nearly-minimal entry-wise error.
arXiv Detail & Related papers (2023-10-10T17:06:41Z)
The Decimation Scheme for Symmetric Matrix Factorization [0.0]
Matrix factorization is an inference problem that has acquired importance due to its vast range of applications. We study this extensive rank problem, extending the alternative 'decimation' procedure that we recently introduced. We introduce a simple algorithm based on a ground state search that implements decimation and performs matrix factorization.
arXiv Detail & Related papers (2023-07-31T10:53:45Z)
What learning algorithm is in-context learning? Investigations with linear models [87.91612418166464]
We investigate the hypothesis that transformer-based in-context learners implement standard learning algorithms implicitly. We show that trained in-context learners closely match the predictors computed by gradient descent, ridge regression, and exact least-squares regression. Preliminary evidence that in-context learners share algorithmic features with these predictors.
arXiv Detail & Related papers (2022-11-28T18:59:51Z)
A Recursively Recurrent Neural Network (R2N2) Architecture for Learning Iterative Algorithms [64.3064050603721]
We generalize Runge-Kutta neural network to a recurrent neural network (R2N2) superstructure for the design of customized iterative algorithms. We demonstrate that regular training of the weight parameters inside the proposed superstructure on input/output data of various computational problem classes yields similar iterations to Krylov solvers for linear equation systems, Newton-Krylov solvers for nonlinear equation systems, and Runge-Kutta solvers for ordinary differential equations.
arXiv Detail & Related papers (2022-11-22T16:30:33Z)
Memory-Efficient Backpropagation through Large Linear Layers [107.20037639738433]
In modern neural networks like Transformers, linear layers require significant memory to store activations during backward pass. This study proposes a memory reduction approach to perform backpropagation through linear layers.
arXiv Detail & Related papers (2022-01-31T13:02:41Z)
Unfolding Projection-free SDP Relaxation of Binary Graph Classifier via GDPA Linearization [59.87663954467815]
Algorithm unfolding creates an interpretable and parsimonious neural network architecture by implementing each iteration of a model-based algorithm as a neural layer. In this paper, leveraging a recent linear algebraic theorem called Gershgorin disc perfect alignment (GDPA), we unroll a projection-free algorithm for semi-definite programming relaxation (SDR) of a binary graph. Experimental results show that our unrolled network outperformed pure model-based graph classifiers, and achieved comparable performance to pure data-driven networks but using far fewer parameters.
arXiv Detail & Related papers (2021-09-10T07:01:15Z)
Revisiting Recursive Least Squares for Training Deep Neural Networks [10.44340837533087]
Recursive least squares (RLS) algorithms were once widely used for training small-scale neural networks, due to their fast convergence. Previous RLS algorithms are unsuitable for training deep neural networks (DNNs), since they have high computational complexity and too many preconditions. We propose three novel RLS optimization algorithms for training feedforward neural networks, convolutional neural networks and recurrent neural networks.
arXiv Detail & Related papers (2021-09-07T17:43:51Z)
Rank-R FNN: A Tensor-Based Learning Model for High-Order Data Classification [69.26747803963907]
Rank-R Feedforward Neural Network (FNN) is a tensor-based nonlinear learning model that imposes Canonical/Polyadic decomposition on its parameters. First, it handles inputs as multilinear arrays, bypassing the need for vectorization, and can thus fully exploit the structural information along every data dimension. We establish the universal approximation and learnability properties of Rank-R FNN, and we validate its performance on real-world hyperspectral datasets.
arXiv Detail & Related papers (2021-04-11T16:37:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.