Federated Hypergradient Descent
- URL: http://arxiv.org/abs/2211.02106v1
- Date: Thu, 3 Nov 2022 19:22:00 GMT
- Title: Federated Hypergradient Descent
- Authors: Andrew K Kan
- Abstract summary: We apply a principled approach on a method for adaptive client learning rate, number of local steps, and batch size.
In our federated learning applications, our primary motivations are minimizing communication budget as well as local computational resources in the training pipeline.
We show our numerical results through extensive empirical experiments with the Federated EMNIST-62 (FEMNIST) and Federated Stack Overflow (FSO) datasets.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we explore combining automatic hyperparameter tuning and
optimization for federated learning (FL) in an online, one-shot procedure. We
apply a principled approach on a method for adaptive client learning rate,
number of local steps, and batch size. In our federated learning applications,
our primary motivations are minimizing communication budget as well as local
computational resources in the training pipeline. Conventionally,
hyperparameter tuning methods involve at least some degree of trial-and-error,
which is known to be sample inefficient. In order to address our motivations,
we propose FATHOM (Federated AuTomatic Hyperparameter OptiMization) as a
one-shot online procedure. We investigate the challenges and solutions of
deriving analytical gradients with respect to the hyperparameters of interest.
Our approach is inspired by the fact that, with the exception of local data, we
have full knowledge of all components involved in our training process, and
this fact can be exploited in our algorithm impactfully. We show that FATHOM is
more communication efficient than Federated Averaging (FedAvg) with optimized,
static valued hyperparameters, and is also more computationally efficient
overall. As a communication efficient, one-shot online procedure, FATHOM solves
the bottleneck of costly communication and limited local computation, by
eliminating a potentially wasteful tuning process, and by optimizing the
hyperparamters adaptively throughout the training procedure without
trial-and-error. We show our numerical results through extensive empirical
experiments with the Federated EMNIST-62 (FEMNIST) and Federated Stack Overflow
(FSO) datasets, using FedJAX as our baseline framework.
Related papers
- Online Parallel Multi-Task Relationship Learning via Alternating Direction Method of Multipliers [37.859185005986056]
Online multi-task learning (OMTL) enhances streaming data processing by leveraging the inherent relations among multiple tasks.
This study proposes a novel OMTL framework based on the alternating direction multiplier method (ADMM), a recent breakthrough in optimization suitable for the distributed computing environment.
arXiv Detail & Related papers (2024-11-09T10:20:13Z) - SpaFL: Communication-Efficient Federated Learning with Sparse Models and Low computational Overhead [75.87007729801304]
SpaFL: a communication-efficient FL framework is proposed to optimize sparse model structures with low computational overhead.
Experiments show that SpaFL improves accuracy while requiring much less communication and computing resources compared to sparse baselines.
arXiv Detail & Related papers (2024-06-01T13:10:35Z) - Federated Learning of Large Language Models with Parameter-Efficient
Prompt Tuning and Adaptive Optimization [71.87335804334616]
Federated learning (FL) is a promising paradigm to enable collaborative model training with decentralized data.
The training process of Large Language Models (LLMs) generally incurs the update of significant parameters.
This paper proposes an efficient partial prompt tuning approach to improve performance and efficiency simultaneously.
arXiv Detail & Related papers (2023-10-23T16:37:59Z) - FedLALR: Client-Specific Adaptive Learning Rates Achieve Linear Speedup
for Non-IID Data [54.81695390763957]
Federated learning is an emerging distributed machine learning method.
We propose a heterogeneous local variant of AMSGrad, named FedLALR, in which each client adjusts its learning rate.
We show that our client-specified auto-tuned learning rate scheduling can converge and achieve linear speedup with respect to the number of clients.
arXiv Detail & Related papers (2023-09-18T12:35:05Z) - A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical
Computation Offloading [62.34538208323411]
We propose a multi-head ensemble multi-task learning (MEMTL) approach with a shared backbone and multiple prediction heads (PHs)
MEMTL outperforms benchmark methods in both the inference accuracy and mean square error without requiring additional training data.
arXiv Detail & Related papers (2023-09-02T11:01:16Z) - Pre-training helps Bayesian optimization too [49.28382118032923]
We seek an alternative practice for setting functional priors.
In particular, we consider the scenario where we have data from similar functions that allow us to pre-train a tighter distribution a priori.
Our results show that our method is able to locate good hyper parameters at least 3 times more efficiently than the best competing methods.
arXiv Detail & Related papers (2022-07-07T04:42:54Z) - Adaptive Client Sampling in Federated Learning via Online Learning with
Bandit Feedback [36.05851452151107]
federated learning (FL) systems need to sample a subset of clients that are involved in each round of training.
Despite its importance, there is limited work on how to sample clients effectively.
We show how our sampling method can improve the convergence speed of optimization algorithms.
arXiv Detail & Related papers (2021-12-28T23:50:52Z) - Amortized Auto-Tuning: Cost-Efficient Transfer Optimization for
Hyperparameter Recommendation [83.85021205445662]
We propose an instantiation--amortized auto-tuning (AT2) to speed up tuning of machine learning models.
We conduct a thorough analysis of the multi-task multi-fidelity Bayesian optimization framework, which leads to the best instantiation--amortized auto-tuning (AT2)
arXiv Detail & Related papers (2021-06-17T00:01:18Z) - Federated Hyperparameter Tuning: Challenges, Baselines, and Connections
to Weight-Sharing [37.056834089598105]
We show how standard approaches may be adapted to form baselines for the federated setting.
By making a novel connection to the neural architecture search technique of weight-sharing, we introduce a new method, FedEx.
Theoretically, we show that a FedEx variant correctly tunes the on-device learning rate in the setting of online convex optimization.
arXiv Detail & Related papers (2021-06-08T16:42:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.