A Simplified Analysis of SGD for Linear Regression with Weight Averaging
- URL: http://arxiv.org/abs/2506.15535v1
- Date: Wed, 18 Jun 2025 15:10:38 GMT
- Title: A Simplified Analysis of SGD for Linear Regression with Weight Averaging
- Authors: Alexandru Meterez, Depen Morwani, Costin-Andrei Oncescu, Jingfeng Wu, Cengiz Pehlevan, Sham Kakade,
- Abstract summary: Recent work bycitetzou 2021benign provides sharp rates for SGD optimization in linear regression using constant learning rate.<n>We provide a simplified analysis recovering the same bias and variance bounds provided incitepzou 2021benign based on simple linear algebra tools.<n>We believe our work makes the analysis of gradient descent on linear regression very accessible and will be helpful in further analyzing mini-batching and learning rate scheduling.
- Score: 64.2393952273612
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Theoretically understanding stochastic gradient descent (SGD) in overparameterized models has led to the development of several optimization algorithms that are widely used in practice today. Recent work by~\citet{zou2021benign} provides sharp rates for SGD optimization in linear regression using constant learning rate, both with and without tail iterate averaging, based on a bias-variance decomposition of the risk. In our work, we provide a simplified analysis recovering the same bias and variance bounds provided in~\citep{zou2021benign} based on simple linear algebra tools, bypassing the requirement to manipulate operators on positive semi-definite (PSD) matrices. We believe our work makes the analysis of SGD on linear regression very accessible and will be helpful in further analyzing mini-batching and learning rate scheduling, leading to improvements in the training of realistic models.
Related papers
- Semi-supervised learning for linear extremile regression [0.8973184739267972]
This paper introduces a novel definition of linear extremile regression along with an accompanying estimation methodology.<n>The regression coefficient estimators of this method achieve $sqrtn$-consistency, which nonparametric extremile regression may not provide.<n>We propose a semi-supervised learning approach to enhance estimation efficiency, even when the specified linear extremile regression model may be misspecified.
arXiv Detail & Related papers (2025-07-02T02:59:15Z) - Self-Boost via Optimal Retraining: An Analysis via Approximate Message Passing [58.52119063742121]
Retraining a model using its own predictions together with the original, potentially noisy labels is a well-known strategy for improving the model performance.<n>This paper addresses the question of how to optimally combine the model's predictions and the provided labels.<n>Our main contribution is the derivation of the Bayes optimal aggregator function to combine the current model's predictions and the given labels.
arXiv Detail & Related papers (2025-05-21T07:16:44Z) - Adaptive debiased SGD in high-dimensional GLMs with streaming data [4.704144189806667]
This paper introduces a novel approach to online inference in high-dimensional generalized linear models.<n>Our method operates in a single-pass mode, making it different from existing methods that require full dataset access or large-dimensional summary statistics storage.<n>The core of our methodological innovation lies in an adaptive descent algorithm tailored for dynamic objective functions, coupled with a novel online debiasing procedure.
arXiv Detail & Related papers (2024-05-28T15:36:48Z) - Meta-Learning with Generalized Ridge Regression: High-dimensional Asymptotics, Optimality and Hyper-covariance Estimation [14.194212772887699]
We consider meta-learning within the framework of high-dimensional random-effects linear models.
We show the precise behavior of the predictive risk for a new test task when the data dimension grows proportionally to the number of samples per task.
We propose and analyze an estimator inverse random regression coefficients based on data from the training tasks.
arXiv Detail & Related papers (2024-03-27T21:18:43Z) - Out of the Ordinary: Spectrally Adapting Regression for Covariate Shift [12.770658031721435]
We propose a method for adapting the weights of the last layer of a pre-trained neural regression model to perform better on input data originating from a different distribution.
We demonstrate how this lightweight spectral adaptation procedure can improve out-of-distribution performance for synthetic and real-world datasets.
arXiv Detail & Related papers (2023-12-29T04:15:58Z) - Optimal Linear Signal: An Unsupervised Machine Learning Framework to
Optimize PnL with Linear Signals [0.0]
This study presents an unsupervised machine learning approach for optimizing Profit and Loss (PnL) in quantitative finance.
Our algorithm, akin to an unsupervised variant of linear regression, maximizes the Sharpe Ratio of PnL generated from signals constructed linearly from external variables.
arXiv Detail & Related papers (2023-11-22T21:10:59Z) - Understanding Incremental Learning of Gradient Descent: A Fine-grained
Analysis of Matrix Sensing [74.2952487120137]
It is believed that Gradient Descent (GD) induces an implicit bias towards good generalization in machine learning models.
This paper provides a fine-grained analysis of the dynamics of GD for the matrix sensing problem.
arXiv Detail & Related papers (2023-01-27T02:30:51Z) - Stability and Generalization Analysis of Gradient Methods for Shallow
Neural Networks [59.142826407441106]
We study the generalization behavior of shallow neural networks (SNNs) by leveraging the concept of algorithmic stability.
We consider gradient descent (GD) and gradient descent (SGD) to train SNNs, for both of which we develop consistent excess bounds.
arXiv Detail & Related papers (2022-09-19T18:48:00Z) - Benign Overfitting of Constant-Stepsize SGD for Linear Regression [122.70478935214128]
inductive biases are central in preventing overfitting empirically.
This work considers this issue in arguably the most basic setting: constant-stepsize SGD for linear regression.
We reflect on a number of notable differences between the algorithmic regularization afforded by (unregularized) SGD in comparison to ordinary least squares.
arXiv Detail & Related papers (2021-03-23T17:15:53Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z) - Nonlinear Regression Analysis Using Multi-Verse Optimizer [0.0]
In regression analysis, optimization algorithms play a significant role in search the coefficients in the regression model.
In this paper, nonlinear regression analysis using a recently developed meta-heuristic Multi-Verse model (MVO) is proposed.
The proposed method is applied to 10 well-known benchmark nonlinear regression problems.
arXiv Detail & Related papers (2020-04-28T15:03:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.