Decentralised Learning with Random Features and Distributed Gradient
Descent
- URL: http://arxiv.org/abs/2007.00360v1
- Date: Wed, 1 Jul 2020 09:55:09 GMT
- Title: Decentralised Learning with Random Features and Distributed Gradient
Descent
- Authors: Dominic Richards, Patrick Rebeschini and Lorenzo Rosasco
- Abstract summary: We investigate the generalisation performance of Distributed Gradient Descent with Implicit Regularisation and Random Features in a homogenous setting.
We establish high probability bounds on the predictive performance for each agent as a function of the step size, number of iterations, inverse spectral gap of the communication matrix and number of Random Features.
We present simulations that show how the number of Random Features, iterations and samples impact predictive performance.
- Score: 39.00450514924611
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We investigate the generalisation performance of Distributed Gradient Descent
with Implicit Regularisation and Random Features in the homogenous setting
where a network of agents are given data sampled independently from the same
unknown distribution. Along with reducing the memory footprint, Random Features
are particularly convenient in this setting as they provide a common
parameterisation across agents that allows to overcome previous difficulties in
implementing Decentralised Kernel Regression. Under standard source and
capacity assumptions, we establish high probability bounds on the predictive
performance for each agent as a function of the step size, number of
iterations, inverse spectral gap of the communication matrix and number of
Random Features. By tuning these parameters, we obtain statistical rates that
are minimax optimal with respect to the total number of samples in the network.
The algorithm provides a linear improvement over single machine Gradient
Descent in memory cost and, when agents hold enough data with respect to the
network size and inverse spectral gap, a linear speed-up in computational
runtime for any network topology. We present simulations that show how the
number of Random Features, iterations and samples impact predictive
performance.
Related papers
- Nonuniform random feature models using derivative information [10.239175197655266]
We propose nonuniform data-driven parameter distributions for neural network initialization based on derivative data of the function to be approximated.
We address the cases of Heaviside and ReLU activation functions, and their smooth approximations (sigmoid and softplus)
We suggest simplifications of these exact densities based on approximate derivative data in the input points that allow for very efficient sampling and lead to performance of random feature models close to optimal networks in several scenarios.
arXiv Detail & Related papers (2024-10-03T01:30:13Z) - Scaling and renormalization in high-dimensional regression [72.59731158970894]
This paper presents a succinct derivation of the training and generalization performance of a variety of high-dimensional ridge regression models.
We provide an introduction and review of recent results on these topics, aimed at readers with backgrounds in physics and deep learning.
arXiv Detail & Related papers (2024-05-01T15:59:00Z) - Regularization, early-stopping and dreaming: a Hopfield-like setup to
address generalization and overfitting [0.0]
We look for optimal network parameters by applying a gradient descent over a regularized loss function.
Within this framework, the optimal neuron-interaction matrices correspond to Hebbian kernels revised by a reiterated unlearning protocol.
arXiv Detail & Related papers (2023-08-01T15:04:30Z) - Just One Byte (per gradient): A Note on Low-Bandwidth Decentralized
Language Model Finetuning Using Shared Randomness [86.61582747039053]
Language model training in distributed settings is limited by the communication cost of exchanges.
We extend recent work using shared randomness to perform distributed fine-tuning with low bandwidth.
arXiv Detail & Related papers (2023-06-16T17:59:51Z) - Optimization of Annealed Importance Sampling Hyperparameters [77.34726150561087]
Annealed Importance Sampling (AIS) is a popular algorithm used to estimates the intractable marginal likelihood of deep generative models.
We present a parameteric AIS process with flexible intermediary distributions and optimize the bridging distributions to use fewer number of steps for sampling.
We assess the performance of our optimized AIS for marginal likelihood estimation of deep generative models and compare it to other estimators.
arXiv Detail & Related papers (2022-09-27T07:58:25Z) - Robust Estimation for Nonparametric Families via Generative Adversarial
Networks [92.64483100338724]
We provide a framework for designing Generative Adversarial Networks (GANs) to solve high dimensional robust statistics problems.
Our work extend these to robust mean estimation, second moment estimation, and robust linear regression.
In terms of techniques, our proposed GAN losses can be viewed as a smoothed and generalized Kolmogorov-Smirnov distance.
arXiv Detail & Related papers (2022-02-02T20:11:33Z) - The Bures Metric for Generative Adversarial Networks [10.69910379275607]
Generative Adversarial Networks (GANs) are performant generative methods yielding high-quality samples.
We propose to match the real batch diversity to the fake batch diversity.
We observe that diversity matching reduces mode collapse substantially and has a positive effect on the sample quality.
arXiv Detail & Related papers (2020-06-16T12:04:41Z) - Fundamental Limits of Ridge-Regularized Empirical Risk Minimization in
High Dimensions [41.7567932118769]
Empirical Risk Minimization algorithms are widely used in a variety of estimation and prediction tasks.
In this paper, we characterize for the first time the fundamental limits on the statistical accuracy of convex ERM for inference.
arXiv Detail & Related papers (2020-06-16T04:27:38Z) - Spatially Adaptive Inference with Stochastic Feature Sampling and
Interpolation [72.40827239394565]
We propose to compute features only at sparsely sampled locations.
We then densely reconstruct the feature map with an efficient procedure.
The presented network is experimentally shown to save substantial computation while maintaining accuracy over a variety of computer vision tasks.
arXiv Detail & Related papers (2020-03-19T15:36:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.