Federated Hypergradient Descent
- URL: http://arxiv.org/abs/2211.02106v1
- Date: Thu, 3 Nov 2022 19:22:00 GMT
- Title: Federated Hypergradient Descent
- Authors: Andrew K Kan
- Abstract summary: We apply a principled approach on a method for adaptive client learning rate, number of local steps, and batch size.
In our federated learning applications, our primary motivations are minimizing communication budget as well as local computational resources in the training pipeline.
We show our numerical results through extensive empirical experiments with the Federated EMNIST-62 (FEMNIST) and Federated Stack Overflow (FSO) datasets.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we explore combining automatic hyperparameter tuning and
optimization for federated learning (FL) in an online, one-shot procedure. We
apply a principled approach on a method for adaptive client learning rate,
number of local steps, and batch size. In our federated learning applications,
our primary motivations are minimizing communication budget as well as local
computational resources in the training pipeline. Conventionally,
hyperparameter tuning methods involve at least some degree of trial-and-error,
which is known to be sample inefficient. In order to address our motivations,
we propose FATHOM (Federated AuTomatic Hyperparameter OptiMization) as a
one-shot online procedure. We investigate the challenges and solutions of
deriving analytical gradients with respect to the hyperparameters of interest.
Our approach is inspired by the fact that, with the exception of local data, we
have full knowledge of all components involved in our training process, and
this fact can be exploited in our algorithm impactfully. We show that FATHOM is
more communication efficient than Federated Averaging (FedAvg) with optimized,
static valued hyperparameters, and is also more computationally efficient
overall. As a communication efficient, one-shot online procedure, FATHOM solves
the bottleneck of costly communication and limited local computation, by
eliminating a potentially wasteful tuning process, and by optimizing the
hyperparamters adaptively throughout the training procedure without
trial-and-error. We show our numerical results through extensive empirical
experiments with the Federated EMNIST-62 (FEMNIST) and Federated Stack Overflow
(FSO) datasets, using FedJAX as our baseline framework.
Related papers
- Bilevel ZOFO: Bridging Parameter-Efficient and Zeroth-Order Techniques for Efficient LLM Fine-Tuning and Meta-Training [44.48966200270378]
Fine-tuning pre-trained Large Language Models (LLMs) for downstream tasks using First-Order (FO)imats presents significant computational challenges.
We propose a bilevel optimization framework that complements ZO methods with PEFT to mitigate sensitivity to hard prompts.
Our Bilevel ZOFO method employs a double-loop optimization strategy, where only the gradient of the PEFT model and the forward pass of the base model are required.
arXiv Detail & Related papers (2025-02-05T20:47:44Z) - Over-the-Air Fair Federated Learning via Multi-Objective Optimization [52.295563400314094]
We propose an over-the-air fair federated learning algorithm (OTA-FFL) to train fair FL models.
Experiments demonstrate the superiority of OTA-FFL in achieving fairness and robust performance.
arXiv Detail & Related papers (2025-01-06T21:16:51Z) - ALoRE: Efficient Visual Adaptation via Aggregating Low Rank Experts [71.91042186338163]
ALoRE is a novel PETL method that reuses the hypercomplex parameterized space constructed by Kronecker product to Aggregate Low Rank Experts.
Thanks to the artful design, ALoRE maintains negligible extra parameters and can be effortlessly merged into the frozen backbone.
arXiv Detail & Related papers (2024-12-11T12:31:30Z) - Online Parallel Multi-Task Relationship Learning via Alternating Direction Method of Multipliers [37.859185005986056]
Online multi-task learning (OMTL) enhances streaming data processing by leveraging the inherent relations among multiple tasks.
This study proposes a novel OMTL framework based on the alternating direction multiplier method (ADMM), a recent breakthrough in optimization suitable for the distributed computing environment.
arXiv Detail & Related papers (2024-11-09T10:20:13Z) - Federated Learning of Large Language Models with Parameter-Efficient
Prompt Tuning and Adaptive Optimization [71.87335804334616]
Federated learning (FL) is a promising paradigm to enable collaborative model training with decentralized data.
The training process of Large Language Models (LLMs) generally incurs the update of significant parameters.
This paper proposes an efficient partial prompt tuning approach to improve performance and efficiency simultaneously.
arXiv Detail & Related papers (2023-10-23T16:37:59Z) - FedLALR: Client-Specific Adaptive Learning Rates Achieve Linear Speedup
for Non-IID Data [54.81695390763957]
Federated learning is an emerging distributed machine learning method.
We propose a heterogeneous local variant of AMSGrad, named FedLALR, in which each client adjusts its learning rate.
We show that our client-specified auto-tuned learning rate scheduling can converge and achieve linear speedup with respect to the number of clients.
arXiv Detail & Related papers (2023-09-18T12:35:05Z) - A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical
Computation Offloading [62.34538208323411]
We propose a multi-head ensemble multi-task learning (MEMTL) approach with a shared backbone and multiple prediction heads (PHs)
MEMTL outperforms benchmark methods in both the inference accuracy and mean square error without requiring additional training data.
arXiv Detail & Related papers (2023-09-02T11:01:16Z) - Pre-training helps Bayesian optimization too [49.28382118032923]
We seek an alternative practice for setting functional priors.
In particular, we consider the scenario where we have data from similar functions that allow us to pre-train a tighter distribution a priori.
Our results show that our method is able to locate good hyper parameters at least 3 times more efficiently than the best competing methods.
arXiv Detail & Related papers (2022-07-07T04:42:54Z) - Federated Hyperparameter Tuning: Challenges, Baselines, and Connections
to Weight-Sharing [37.056834089598105]
We show how standard approaches may be adapted to form baselines for the federated setting.
By making a novel connection to the neural architecture search technique of weight-sharing, we introduce a new method, FedEx.
Theoretically, we show that a FedEx variant correctly tunes the on-device learning rate in the setting of online convex optimization.
arXiv Detail & Related papers (2021-06-08T16:42:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.