On the Outsized Importance of Learning Rates in Local Update Methods
- URL: http://arxiv.org/abs/2007.00878v1
- Date: Thu, 2 Jul 2020 04:45:55 GMT
- Title: On the Outsized Importance of Learning Rates in Local Update Methods
- Authors: Zachary Charles, Jakub Kone\v{c}n\'y
- Abstract summary: We study a family of algorithms, which we refer to as local update methods, that generalize many federated learning and meta-learning algorithms.
We prove that for quadratic objectives, local update methods perform gradient descent on a surrogate loss function which we exactly characterize.
We show that the choice of client learning rate controls the condition number of that surrogate loss, as well as the distance between the minimizers of the surrogate and true loss functions.
- Score: 2.094022863940315
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study a family of algorithms, which we refer to as local update methods,
that generalize many federated learning and meta-learning algorithms. We prove
that for quadratic objectives, local update methods perform stochastic gradient
descent on a surrogate loss function which we exactly characterize. We show
that the choice of client learning rate controls the condition number of that
surrogate loss, as well as the distance between the minimizers of the surrogate
and true loss functions. We use this theory to derive novel convergence rates
for federated averaging that showcase this trade-off between the condition
number of the surrogate loss and its alignment with the true loss function. We
validate our results empirically, showing that in communication-limited
settings, proper learning rate tuning is often sufficient to reach near-optimal
behavior. We also present a practical method for automatic learning rate decay
in local update methods that helps reduce the need for learning rate tuning,
and highlight its empirical performance on a variety of tasks and datasets.
Related papers
- Vlearn: Off-Policy Learning with Efficient State-Value Function Estimation [22.129001951441015]
Existing off-policy reinforcement learning algorithms often rely on an explicit state-action-value function representation.
This reliance results in data inefficiency as maintaining a state-action-value function in high-dimensional action spaces is challenging.
We present an efficient approach that utilizes only a state-value function as the critic for off-policy deep reinforcement learning.
arXiv Detail & Related papers (2024-03-07T12:45:51Z) - FedLALR: Client-Specific Adaptive Learning Rates Achieve Linear Speedup
for Non-IID Data [54.81695390763957]
Federated learning is an emerging distributed machine learning method.
We propose a heterogeneous local variant of AMSGrad, named FedLALR, in which each client adjusts its learning rate.
We show that our client-specified auto-tuned learning rate scheduling can converge and achieve linear speedup with respect to the number of clients.
arXiv Detail & Related papers (2023-09-18T12:35:05Z) - Adaptive Local-Component-aware Graph Convolutional Network for One-shot
Skeleton-based Action Recognition [54.23513799338309]
We present an Adaptive Local-Component-aware Graph Convolutional Network for skeleton-based action recognition.
Our method provides a stronger representation than the global embedding and helps our model reach state-of-the-art.
arXiv Detail & Related papers (2022-09-21T02:33:07Z) - On Generalizing Beyond Domains in Cross-Domain Continual Learning [91.56748415975683]
Deep neural networks often suffer from catastrophic forgetting of previously learned knowledge after learning a new task.
Our proposed approach learns new tasks under domain shift with accuracy boosts up to 10% on challenging datasets such as DomainNet and OfficeHome.
arXiv Detail & Related papers (2022-03-08T09:57:48Z) - Relational Surrogate Loss Learning [41.61184221367546]
This paper revisits the surrogate loss learning, where a deep neural network is employed to approximate the evaluation metrics.
In this paper, we show that directly maintaining the relation of models between surrogate losses and metrics suffices.
Our method is much easier to optimize and enjoys significant efficiency and performance gains.
arXiv Detail & Related papers (2022-02-26T17:32:57Z) - Local Learning Matters: Rethinking Data Heterogeneity in Federated
Learning [61.488646649045215]
Federated learning (FL) is a promising strategy for performing privacy-preserving, distributed learning with a network of clients (i.e., edge devices)
arXiv Detail & Related papers (2021-11-28T19:03:39Z) - Convergence and Accuracy Trade-Offs in Federated Learning and
Meta-Learning [0.0]
We study a family of algorithms, which we refer to as local update methods.
We prove that for quadratic models, local update methods are equivalent to first-order optimization on a surrogate loss.
We derive novel convergence rates showcasing these trade-offs and highlight their importance in communication-limited settings.
arXiv Detail & Related papers (2021-03-08T19:40:32Z) - META-Learning Eligibility Traces for More Sample Efficient Temporal
Difference Learning [2.0559497209595823]
We propose a meta-learning method for adjusting the eligibility trace parameter, in a state-dependent manner.
The adaptation is achieved with the help of auxiliary learners that learn distributional information about the update targets online.
We prove that, under some assumptions, the proposed method improves the overall quality of the update targets, by minimizing the overall target error.
arXiv Detail & Related papers (2020-06-16T03:41:07Z) - DisCor: Corrective Feedback in Reinforcement Learning via Distribution
Correction [96.90215318875859]
We show that bootstrapping-based Q-learning algorithms do not necessarily benefit from corrective feedback.
We propose a new algorithm, DisCor, which computes an approximation to this optimal distribution and uses it to re-weight the transitions used for training.
arXiv Detail & Related papers (2020-03-16T16:18:52Z) - Dynamic Federated Learning [57.14673504239551]
Federated learning has emerged as an umbrella term for centralized coordination strategies in multi-agent environments.
We consider a federated learning model where at every iteration, a random subset of available agents perform local updates based on their data.
Under a non-stationary random walk model on the true minimizer for the aggregate optimization problem, we establish that the performance of the architecture is determined by three factors, namely, the data variability at each agent, the model variability across all agents, and a tracking term that is inversely proportional to the learning rate of the algorithm.
arXiv Detail & Related papers (2020-02-20T15:00:54Z) - Adaptive Gradient Sparsification for Efficient Federated Learning: An
Online Learning Approach [11.986523531539165]
Federated learning (FL) is an emerging technique for training machine learning models using geographically dispersed data.
gradient sparsification (GS) can be applied, where instead of the full gradient, only a small subset of important elements of the gradient is communicated.
We propose a novel online learning formulation and algorithm for automatically determining the near-optimal communication and trade-off.
arXiv Detail & Related papers (2020-01-14T13:09:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.