Local SGD in Overparameterized Linear Regression
- URL: http://arxiv.org/abs/2210.11562v1
- Date: Thu, 20 Oct 2022 19:58:22 GMT
- Title: Local SGD in Overparameterized Linear Regression
- Authors: Mike Nguyen, Charly Kirst, and Nicole M\"ucke
- Abstract summary: We consider distributed learning using constant stepsize SGD (DSGD) over several devices, each sending a final model update to a central server.
We show that the excess risk is of order of the variance provided the number of local nodes grows not too large with the global sample size.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We consider distributed learning using constant stepsize SGD (DSGD) over
several devices, each sending a final model update to a central server. In a
final step, the local estimates are aggregated. We prove in the setting of
overparameterized linear regression general upper bounds with matching lower
bounds and derive learning rates for specific data generating distributions. We
show that the excess risk is of order of the variance provided the number of
local nodes grows not too large with the global sample size. We further compare
the sample complexity of DSGD with the sample complexity of distributed ridge
regression (DRR) and show that the excess SGD-risk is smaller than the excess
RR-risk, where both sample complexities are of the same order.
Related papers
- Policy Evaluation in Distributional LQR [70.63903506291383]
We provide a closed-form expression of the distribution of the random return.
We show that this distribution can be approximated by a finite number of random variables.
Using the approximate return distribution, we propose a zeroth-order policy gradient algorithm for risk-averse LQR.
arXiv Detail & Related papers (2023-03-23T20:27:40Z) - Transfer Learning In Differential Privacy's Hybrid-Model [10.584333748643774]
We study the problem of machine learning in the hybrid-model where the n individuals in the curators dataset are drawn from a different distribution.
We give a general scheme -- Subsample-Test-Reweigh -- for this transfer learning problem.
arXiv Detail & Related papers (2022-01-28T09:54:54Z) - Benign Overfitting of Constant-Stepsize SGD for Linear Regression [122.70478935214128]
inductive biases are central in preventing overfitting empirically.
This work considers this issue in arguably the most basic setting: constant-stepsize SGD for linear regression.
We reflect on a number of notable differences between the algorithmic regularization afforded by (unregularized) SGD in comparison to ordinary least squares.
arXiv Detail & Related papers (2021-03-23T17:15:53Z) - GANs with Variational Entropy Regularizers: Applications in Mitigating
the Mode-Collapse Issue [95.23775347605923]
Building on the success of deep learning, Generative Adversarial Networks (GANs) provide a modern approach to learn a probability distribution from observed samples.
GANs often suffer from the mode collapse issue where the generator fails to capture all existing modes of the input distribution.
We take an information-theoretic approach and maximize a variational lower bound on the entropy of the generated samples to increase their diversity.
arXiv Detail & Related papers (2020-09-24T19:34:37Z) - Global Distance-distributions Separation for Unsupervised Person
Re-identification [93.39253443415392]
Existing unsupervised ReID approaches often fail in correctly identifying the positive samples and negative samples through the distance-based matching/ranking.
We introduce a global distance-distributions separation constraint over the two distributions to encourage the clear separation of positive and negative samples from a global view.
We show that our method leads to significant improvement over the baselines and achieves the state-of-the-art performance.
arXiv Detail & Related papers (2020-06-01T07:05:39Z) - Generalization Error for Linear Regression under Distributed Learning [0.0]
We consider the setting where the unknowns are distributed over a network of nodes.
We present an analytical characterization of the dependence of the generalization error on the partitioning of the unknowns over nodes.
arXiv Detail & Related papers (2020-04-30T08:49:46Z) - A Unified Theory of Decentralized SGD with Changing Topology and Local
Updates [70.9701218475002]
We introduce a unified convergence analysis of decentralized communication methods.
We derive universal convergence rates for several applications.
Our proofs rely on weak assumptions.
arXiv Detail & Related papers (2020-03-23T17:49:15Z) - Information Directed Sampling for Linear Partial Monitoring [112.05623123909895]
We introduce information directed sampling (IDS) for partial monitoring with a linear reward and observation structure.
IDS achieves adaptive worst-case regret rates that depend on precise observability conditions of the game.
We extend our results to the contextual and the kernelized setting, which significantly increases the range of possible applications.
arXiv Detail & Related papers (2020-02-25T21:30:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.