Federated Transfer Learning with Dynamic Gradient Aggregation
- URL: http://arxiv.org/abs/2008.02452v1
- Date: Thu, 6 Aug 2020 04:29:01 GMT
- Title: Federated Transfer Learning with Dynamic Gradient Aggregation
- Authors: Dimitrios Dimitriadis, Kenichi Kumatani, Robert Gmyr, Yashesh Gaur and
Sefik Emre Eskimez
- Abstract summary: This paper introduces a Federated Learning (FL) simulation platform for Acoustic Model training.
The proposed FL platform can support different tasks based on the adopted modular design.
It is shown to outperform the golden standard of distributed training in both convergence speed and overall model performance.
- Score: 27.42998421786922
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In this paper, a Federated Learning (FL) simulation platform is introduced.
The target scenario is Acoustic Model training based on this platform. To our
knowledge, this is the first attempt to apply FL techniques to Speech
Recognition tasks due to the inherent complexity. The proposed FL platform can
support different tasks based on the adopted modular design. As part of the
platform, a novel hierarchical optimization scheme and two gradient aggregation
methods are proposed, leading to almost an order of magnitude improvement in
training convergence speed compared to other distributed or FL training
algorithms like BMUF and FedAvg. The hierarchical optimization offers
additional flexibility in the training pipeline besides the enhanced
convergence speed. On top of the hierarchical optimization, a dynamic gradient
aggregation algorithm is proposed, based on a data-driven weight inference.
This aggregation algorithm acts as a regularizer of the gradient quality.
Finally, an unsupervised training pipeline tailored to FL is presented as a
separate training scenario. The experimental validation of the proposed system
is based on two tasks: first, the LibriSpeech task showing a speed-up of 7x and
6% Word Error Rate reduction (WERR) compared to the baseline results. The
second task is based on session adaptation providing an improvement of 20% WERR
over a competitive production-ready LAS model. The proposed Federated Learning
system is shown to outperform the golden standard of distributed training in
both convergence speed and overall model performance.
Related papers
- Efficient Stagewise Pretraining via Progressive Subnetworks [53.00045381931778]
The prevailing view suggests that stagewise dropping strategies, such as layer dropping, are ineffective when compared to stacking-based approaches.
This paper challenges this notion by demonstrating that, with proper design, dropping strategies can be competitive, if not better, than stacking methods.
We propose an instantiation of this framework - Random Part Training (RAPTR) - that selects and trains only a random subnetwork at each step, progressively increasing the size in stages.
arXiv Detail & Related papers (2024-02-08T18:49:09Z) - Semi-Federated Learning: Convergence Analysis and Optimization of A
Hybrid Learning Framework [70.83511997272457]
We propose a semi-federated learning (SemiFL) paradigm to leverage both the base station (BS) and devices for a hybrid implementation of centralized learning (CL) and FL.
We propose a two-stage algorithm to solve this intractable problem, in which we provide the closed-form solutions to the beamformers.
arXiv Detail & Related papers (2023-10-04T03:32:39Z) - Vertical Federated Learning over Cloud-RAN: Convergence Analysis and
System Optimization [82.12796238714589]
We propose a novel cloud radio access network (Cloud-RAN) based vertical FL system to enable fast and accurate model aggregation.
We characterize the convergence behavior of the vertical FL algorithm considering both uplink and downlink transmissions.
We establish a system optimization framework by joint transceiver and fronthaul quantization design, for which successive convex approximation and alternate convex search based system optimization algorithms are developed.
arXiv Detail & Related papers (2023-05-04T09:26:03Z) - FedDA: Faster Framework of Local Adaptive Gradient Methods via Restarted
Dual Averaging [104.41634756395545]
Federated learning (FL) is an emerging learning paradigm to tackle massively distributed data.
We propose textbfFedDA, a novel framework for local adaptive gradient methods.
We show that textbfFedDA-MVR is the first adaptive FL algorithm that achieves this rate.
arXiv Detail & Related papers (2023-02-13T05:10:30Z) - Faster Adaptive Federated Learning [84.38913517122619]
Federated learning has attracted increasing attention with the emergence of distributed data.
In this paper, we propose an efficient adaptive algorithm (i.e., FAFED) based on momentum-based variance reduced technique in cross-silo FL.
arXiv Detail & Related papers (2022-12-02T05:07:50Z) - Performance Optimization for Variable Bitwidth Federated Learning in
Wireless Networks [103.22651843174471]
This paper considers improving wireless communication and computation efficiency in federated learning (FL) via model quantization.
In the proposed bitwidth FL scheme, edge devices train and transmit quantized versions of their local FL model parameters to a coordinating server, which aggregates them into a quantized global model and synchronizes the devices.
We show that the FL training process can be described as a Markov decision process and propose a model-based reinforcement learning (RL) method to optimize action selection over iterations.
arXiv Detail & Related papers (2022-09-21T08:52:51Z) - Dynamic Gradient Aggregation for Federated Domain Adaptation [31.264050568762592]
We introduce a new learning algorithm for Federated Learning (FL)
The proposed scheme is based on a weighted gradient aggregation using two-step optimization to offer a flexible training pipeline.
We investigate the effect of our FL algorithm in supervised and unsupervised Speech Recognition (SR) scenarios.
arXiv Detail & Related papers (2021-06-14T16:34:28Z) - Federated Learning via Intelligent Reflecting Surface [30.935389187215474]
Over-the-air computation algorithm (AirComp) based learning (FL) is capable of achieving fast model aggregation by exploiting the waveform superposition property of multiple access channels.
In this paper, we propose a two-step optimization framework to achieve fast yet reliable model aggregation for AirComp-based FL.
Simulation results will demonstrate that our proposed framework and the deployment of an IRS can achieve a lower training loss and higher FL prediction accuracy than the baseline algorithms.
arXiv Detail & Related papers (2020-11-10T11:29:57Z) - Weighted Aggregating Stochastic Gradient Descent for Parallel Deep
Learning [8.366415386275557]
Solution involves a reformation of the objective function for optimization in neural network models.
We introduce a decentralized weighted aggregating scheme based on the performance of local workers.
To validate the new method, we benchmark our schemes against several popular algorithms.
arXiv Detail & Related papers (2020-04-07T23:38:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.