Importance of Smoothness Induced by Optimizers in FL4ASR: Towards
Understanding Federated Learning for End-to-End ASR
- URL: http://arxiv.org/abs/2309.13102v1
- Date: Fri, 22 Sep 2023 17:23:01 GMT
- Title: Importance of Smoothness Induced by Optimizers in FL4ASR: Towards
Understanding Federated Learning for End-to-End ASR
- Authors: Sheikh Shams Azam, Tatiana Likhomanenko, Martin Pelikan, Jan "Honza"
Silovsky
- Abstract summary: We start by training End-to-End Automatic Speech Recognition (ASR) models using Federated Learning (FL)
We examine the fundamental considerations that can be pivotal in minimizing the performance gap in terms of word error rate between models trained using FL versus their centralized counterpart.
- Score: 12.108696564200052
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we start by training End-to-End Automatic Speech Recognition
(ASR) models using Federated Learning (FL) and examining the fundamental
considerations that can be pivotal in minimizing the performance gap in terms
of word error rate between models trained using FL versus their centralized
counterpart. Specifically, we study the effect of (i) adaptive optimizers, (ii)
loss characteristics via altering Connectionist Temporal Classification (CTC)
weight, (iii) model initialization through seed start, (iv) carrying over
modeling setup from experiences in centralized training to FL, e.g., pre-layer
or post-layer normalization, and (v) FL-specific hyperparameters, such as
number of local epochs, client sampling size, and learning rate scheduler,
specifically for ASR under heterogeneous data distribution. We shed light on
how some optimizers work better than others via inducing smoothness. We also
summarize the applicability of algorithms, trends, and propose best practices
from prior works in FL (in general) toward End-to-End ASR models.
Related papers
- Feasible Learning [78.6167929413604]
We introduce Feasible Learning (FL), a sample-centric learning paradigm where models are trained by solving a feasibility problem that bounds the loss for each training sample.
Our empirical analysis, spanning image classification, age regression, and preference optimization in large language models, demonstrates that models trained via FL can learn from data while displaying improved tail behavior compared to ERM, with only a marginal impact on average performance.
arXiv Detail & Related papers (2025-01-24T20:39:38Z) - Over-the-Air Fair Federated Learning via Multi-Objective Optimization [52.295563400314094]
We propose an over-the-air fair federated learning algorithm (OTA-FFL) to train fair FL models.
Experiments demonstrate the superiority of OTA-FFL in achieving fairness and robust performance.
arXiv Detail & Related papers (2025-01-06T21:16:51Z) - Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Self-Regularization [77.62516752323207]
We introduce an orthogonal fine-tuning method for efficiently fine-tuning pretrained weights and enabling enhanced robustness and generalization.
A self-regularization strategy is further exploited to maintain the stability in terms of zero-shot generalization of VLMs, dubbed OrthSR.
For the first time, we revisit the CLIP and CoOp with our method to effectively improve the model on few-shot image classficiation scenario.
arXiv Detail & Related papers (2024-07-11T10:35:53Z) - Rethinking the Starting Point: Collaborative Pre-Training for Federated Downstream Tasks [21.842345900168525]
CoPreFL is a model-agnostic meta-learning (MAML) procedure that tailors the global model to closely mimic heterogeneous and unseen FL scenarios.
Our MAML procedure incorporates performance variance into the meta-objective function, balancing performance across clients.
We demonstrate that CoPreFL obtains significant improvements in both average accuracy and variance across arbitrary downstream FL tasks.
arXiv Detail & Related papers (2024-02-03T17:58:43Z) - Adaptive Model Pruning and Personalization for Federated Learning over
Wireless Networks [72.59891661768177]
Federated learning (FL) enables distributed learning across edge devices while protecting data privacy.
We consider a FL framework with partial model pruning and personalization to overcome these challenges.
This framework splits the learning model into a global part with model pruning shared with all devices to learn data representations and a personalized part to be fine-tuned for a specific device.
arXiv Detail & Related papers (2023-09-04T21:10:45Z) - Guiding The Last Layer in Federated Learning with Pre-Trained Models [18.382057374270143]
Federated Learning (FL) is an emerging paradigm that allows a model to be trained across a number of participants without sharing data.
We show that fitting a classification head using the Nearest Class Means (NCM) can be done exactly and orders of magnitude more efficiently than existing proposals.
arXiv Detail & Related papers (2023-06-06T18:02:02Z) - Vertical Federated Learning over Cloud-RAN: Convergence Analysis and
System Optimization [82.12796238714589]
We propose a novel cloud radio access network (Cloud-RAN) based vertical FL system to enable fast and accurate model aggregation.
We characterize the convergence behavior of the vertical FL algorithm considering both uplink and downlink transmissions.
We establish a system optimization framework by joint transceiver and fronthaul quantization design, for which successive convex approximation and alternate convex search based system optimization algorithms are developed.
arXiv Detail & Related papers (2023-05-04T09:26:03Z) - Accelerating Federated Learning with a Global Biased Optimiser [16.69005478209394]
Federated Learning (FL) is a recent development in the field of machine learning that collaboratively trains models without the training data leaving client devices.
We propose a novel, generalised approach for applying adaptive optimisation techniques to FL with the Federated Global Biased Optimiser (FedGBO) algorithm.
FedGBO accelerates FL by applying a set of global biased optimiser values during the local training phase of FL, which helps to reduce client-drift' from non-IID data.
arXiv Detail & Related papers (2021-08-20T12:08:44Z) - Prototype Guided Federated Learning of Visual Feature Representations [15.021124010665194]
Federated Learning (FL) is a framework which enables distributed model training using a large corpus of decentralized training data.
Existing methods aggregate models disregarding their internal representations, which are crucial for training models in vision tasks.
We introduce FedProto, which computes client deviations using margins of representations learned on distributed data.
arXiv Detail & Related papers (2021-05-19T08:29:12Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.