Asymptotic Dynamics of Alternating Minimization for Bilinear Regression
- URL: http://arxiv.org/abs/2402.04751v3
- Date: Fri, 31 Jan 2025 09:42:37 GMT
- Title: Asymptotic Dynamics of Alternating Minimization for Bilinear Regression
- Authors: Koki Okajima, Takashi Takahashi,
- Abstract summary: This study investigates the dynamics of minimization applied to a bilinear regression task under an alternating system size limit.
Our results show that the dynamics can be described effectively by a two-dimensional discrete process, where each step depends on all previous time steps.
The theoretical framework developed in this work can be applied to the analysis of various iterative algorithms, extending beyond the scope of alternating minimization.
- Score: 2.992602379681373
- License:
- Abstract: This study investigates the dynamics of alternating minimization applied to a bilinear regression task with normally distributed covariates, under the asymptotic system size limit where the number of parameters and observations diverge at the same rate. This is achieved by employing the replica method to a multi-temperature glassy system which unfolds the algorithm's time evolution. Our results show that the dynamics can be described effectively by a two-dimensional discrete stochastic process, where each step depends on all previous time steps, revealing the structure of the memory dependence in the evolution of alternating minimization. The theoretical framework developed in this work can be applied to the analysis of various iterative algorithms, extending beyond the scope of alternating minimization.
Related papers
- High-Dimensional Markov-switching Ordinary Differential Processes [23.17395115394655]
We develop a two-stage algorithm that first recovers the continuous sample path from discrete samples and then estimates the parameters of the processes.
We provide novel theoretical insights into the statistical error and linear convergence guarantee when the processes are $beta$-mixing.
We apply this model to investigate the differences in resting-state brain networks between the ADHD group and normal controls.
arXiv Detail & Related papers (2024-12-30T18:41:28Z) - Stochastic gradient descent estimation of generalized matrix factorization models with application to single-cell RNA sequencing data [41.94295877935867]
Single-cell RNA sequencing allows the quantitation of gene expression at the individual cell level.
Dimensionality reduction is a common preprocessing step to simplify the visualization, clustering, and phenotypic characterization of samples.
We present a generalized matrix factorization model assuming a general exponential dispersion family distribution.
We propose a scalable adaptive descent algorithm that allows us to estimate the model efficiently.
arXiv Detail & Related papers (2024-12-29T16:02:15Z) - Stochastic Gradient Descent for Gaussian Processes Done Right [86.83678041846971]
We show that when emphdone right -- by which we mean using specific insights from optimisation and kernel communities -- gradient descent is highly effective.
We introduce a emphstochastic dual descent algorithm, explain its design in an intuitive manner and illustrate the design choices.
Our method places Gaussian process regression on par with state-of-the-art graph neural networks for molecular binding affinity prediction.
arXiv Detail & Related papers (2023-10-31T16:15:13Z) - A primal-dual perspective for distributed TD-learning [7.871657629581001]
The goal of this paper is to investigate distributed temporal difference (TD) learning for a networked multi-agent Markov decision process.
The proposed approach is based on distributed optimization algorithms, which can be interpreted as primal-dual Ordinary differential equation (ODE) dynamics subject to null-space constraints.
arXiv Detail & Related papers (2023-10-01T10:38:46Z) - Structured Optimal Variational Inference for Dynamic Latent Space Models [16.531262817315696]
We consider a latent space model for dynamic networks, where our objective is to estimate the pairwise inner products plus the intercept of the latent positions.
To balance posterior inference and computational scalability, we consider a structured mean-field variational inference framework.
arXiv Detail & Related papers (2022-09-29T22:10:42Z) - Amortized Implicit Differentiation for Stochastic Bilevel Optimization [53.12363770169761]
We study a class of algorithms for solving bilevel optimization problems in both deterministic and deterministic settings.
We exploit a warm-start strategy to amortize the estimation of the exact gradient.
By using this framework, our analysis shows these algorithms to match the computational complexity of methods that have access to an unbiased estimate of the gradient.
arXiv Detail & Related papers (2021-11-29T15:10:09Z) - Fractal Structure and Generalization Properties of Stochastic
Optimization Algorithms [71.62575565990502]
We prove that the generalization error of an optimization algorithm can be bounded on the complexity' of the fractal structure that underlies its generalization measure.
We further specialize our results to specific problems (e.g., linear/logistic regression, one hidden/layered neural networks) and algorithms.
arXiv Detail & Related papers (2021-06-09T08:05:36Z) - A Discrete Variational Derivation of Accelerated Methods in Optimization [68.8204255655161]
We introduce variational which allow us to derive different methods for optimization.
We derive two families of optimization methods in one-to-one correspondence.
The preservation of symplecticity of autonomous systems occurs here solely on the fibers.
arXiv Detail & Related papers (2021-06-04T20:21:53Z) - Cogradient Descent for Bilinear Optimization [124.45816011848096]
We introduce a Cogradient Descent algorithm (CoGD) to address the bilinear problem.
We solve one variable by considering its coupling relationship with the other, leading to a synchronous gradient descent.
Our algorithm is applied to solve problems with one variable under the sparsity constraint.
arXiv Detail & Related papers (2020-06-16T13:41:54Z) - Multiplicative noise and heavy tails in stochastic optimization [62.993432503309485]
empirical optimization is central to modern machine learning, but its role in its success is still unclear.
We show that it commonly arises in parameters of discrete multiplicative noise due to variance.
A detailed analysis is conducted in which we describe on key factors, including recent step size, and data, all exhibit similar results on state-of-the-art neural network models.
arXiv Detail & Related papers (2020-06-11T09:58:01Z) - Sparse Orthogonal Variational Inference for Gaussian Processes [34.476453597078894]
We introduce a new interpretation of sparse variational approximations for Gaussian processes using inducing points.
We show that this formulation recovers existing approximations and at the same time allows to obtain tighter lower bounds on the marginal likelihood and new variational inference algorithms.
arXiv Detail & Related papers (2019-10-23T15:01:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.