Related papers: Local SGD in Overparameterized Linear Regression

Local SGD in Overparameterized Linear Regression

URL: http://arxiv.org/abs/2210.11562v1
Date: Thu, 20 Oct 2022 19:58:22 GMT
Title: Local SGD in Overparameterized Linear Regression
Authors: Mike Nguyen, Charly Kirst, and Nicole M\"ucke
Abstract summary: We consider distributed learning using constant stepsize SGD (DSGD) over several devices, each sending a final model update to a central server. We show that the excess risk is of order of the variance provided the number of local nodes grows not too large with the global sample size.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: We consider distributed learning using constant stepsize SGD (DSGD) over several devices, each sending a final model update to a central server. In a final step, the local estimates are aggregated. We prove in the setting of overparameterized linear regression general upper bounds with matching lower bounds and derive learning rates for specific data generating distributions. We show that the excess risk is of order of the variance provided the number of local nodes grows not too large with the global sample size. We further compare the sample complexity of DSGD with the sample complexity of distributed ridge regression (DRR) and show that the excess SGD-risk is smaller than the excess RR-risk, where both sample complexities are of the same order.

Related papers

Distribution-dependent Generalization Bounds for Tuning Linear Regression Across Tasks [24.2043855572415]
We find distribution-dependent bounds on the generalization error for the validation loss when tuning the L1 and L2 coefficients.<n>We extend our results to a generalization of ridge regression, where we achieve tighter bounds that take into account the mean of the ground truth distribution.
arXiv Detail & Related papers (2025-07-07T15:08:45Z)
Policy Evaluation in Distributional LQR [70.63903506291383]
We provide a closed-form expression of the distribution of the random return. We show that this distribution can be approximated by a finite number of random variables. Using the approximate return distribution, we propose a zeroth-order policy gradient algorithm for risk-averse LQR.
arXiv Detail & Related papers (2023-03-23T20:27:40Z)
Transfer Learning In Differential Privacy's Hybrid-Model [10.584333748643774]
We study the problem of machine learning in the hybrid-model where the n individuals in the curators dataset are drawn from a different distribution. We give a general scheme -- Subsample-Test-Reweigh -- for this transfer learning problem.
arXiv Detail & Related papers (2022-01-28T09:54:54Z)
Benign Overfitting of Constant-Stepsize SGD for Linear Regression [122.70478935214128]
inductive biases are central in preventing overfitting empirically. This work considers this issue in arguably the most basic setting: constant-stepsize SGD for linear regression. We reflect on a number of notable differences between the algorithmic regularization afforded by (unregularized) SGD in comparison to ordinary least squares.
arXiv Detail & Related papers (2021-03-23T17:15:53Z)
Hogwild! over Distributed Local Data Sets with Linearly Increasing Mini-Batch Sizes [26.9902939745173]
Hogwild! implements asynchronous Gradient Descent where multiple threads in parallel access a common repository containing training data. We show how local compute nodes can start choosing small mini-batch sizes which increase to larger ones in order to reduce communication cost.
arXiv Detail & Related papers (2020-10-27T03:46:09Z)
GANs with Variational Entropy Regularizers: Applications in Mitigating the Mode-Collapse Issue [95.23775347605923]
Building on the success of deep learning, Generative Adversarial Networks (GANs) provide a modern approach to learn a probability distribution from observed samples. GANs often suffer from the mode collapse issue where the generator fails to capture all existing modes of the input distribution. We take an information-theoretic approach and maximize a variational lower bound on the entropy of the generated samples to increase their diversity.
arXiv Detail & Related papers (2020-09-24T19:34:37Z)
Global Distance-distributions Separation for Unsupervised Person Re-identification [93.39253443415392]
Existing unsupervised ReID approaches often fail in correctly identifying the positive samples and negative samples through the distance-based matching/ranking. We introduce a global distance-distributions separation constraint over the two distributions to encourage the clear separation of positive and negative samples from a global view. We show that our method leads to significant improvement over the baselines and achieves the state-of-the-art performance.
arXiv Detail & Related papers (2020-06-01T07:05:39Z)
Generalization Error for Linear Regression under Distributed Learning [0.0]
We consider the setting where the unknowns are distributed over a network of nodes. We present an analytical characterization of the dependence of the generalization error on the partitioning of the unknowns over nodes.
arXiv Detail & Related papers (2020-04-30T08:49:46Z)
A Unified Theory of Decentralized SGD with Changing Topology and Local Updates [70.9701218475002]
We introduce a unified convergence analysis of decentralized communication methods. We derive universal convergence rates for several applications. Our proofs rely on weak assumptions.
arXiv Detail & Related papers (2020-03-23T17:49:15Z)
Information Directed Sampling for Linear Partial Monitoring [112.05623123909895]
We introduce information directed sampling (IDS) for partial monitoring with a linear reward and observation structure. IDS achieves adaptive worst-case regret rates that depend on precise observability conditions of the game. We extend our results to the contextual and the kernelized setting, which significantly increases the range of possible applications.
arXiv Detail & Related papers (2020-02-25T21:30:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.