Related papers: Federating Dynamic Models using Early-Exit Architectures for Automatic Speech Recognition on Heterogeneous Clients

Federating Dynamic Models using Early-Exit Architectures for Automatic Speech Recognition on Heterogeneous Clients

URL: http://arxiv.org/abs/2405.17376v1
Date: Mon, 27 May 2024 17:32:37 GMT
Title: Federating Dynamic Models using Early-Exit Architectures for Automatic Speech Recognition on Heterogeneous Clients
Authors: Mohamed Nabih Ali, Alessio Brutti, Daniele Falavigna,
Abstract summary: Federated learning is a technique that collaboratively learns a shared prediction model while keeping the data local on different clients. We propose using dynamical architectures which, employing early-exit solutions, can adapt their processing depending on the input and on the operation conditions. This solution falls in the realm of partial training methods and brings two benefits: a single model is used on a variety of devices; federating the models after local training is straightforward.
Score: 12.008071873475169
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Automatic speech recognition models require large amounts of speech recordings for training. However, the collection of such data often is cumbersome and leads to privacy concerns. Federated learning has been widely used as an effective decentralized technique that collaboratively learns a shared prediction model while keeping the data local on different clients. Unfortunately, client devices often feature limited computation and communication resources leading to practical difficulties for large models. In addition, the heterogeneity that characterizes edge devices makes it sub-optimal to generate a single model that fits all of them. Differently from the recent literature, where multiple models with different architectures are used, in this work, we propose using dynamical architectures which, employing early-exit solutions, can adapt their processing (i.e. traversed layers) depending on the input and on the operation conditions. This solution falls in the realm of partial training methods and brings two benefits: a single model is used on a variety of devices; federating the models after local training is straightforward. Experiments on public datasets show that our proposed approach is effective and can be combined with basic federated learning strategies.

Related papers

OFA-MAS: One-for-All Multi-Agent System Topology Design based on Mixture-of-Experts Graph Generative Models [57.94189874119267]
Multi-Agent Systems (MAS) offer a powerful paradigm for solving complex problems.<n>Current graph learning-based design methodologies often adhere to a "one-for-one" paradigm.<n>We propose OFA-TAD, a one-for-all framework that generates adaptive collaboration graphs for any task described in natural language.
arXiv Detail & Related papers (2026-01-19T12:23:44Z)
Cross-Silo Federated Learning Across Divergent Domains with Iterative Parameter Alignment [4.95475852994362]
Federated learning is a method for training a machine learning model across remote clients. We reformulate the typical federated learning setup to learn N models optimized for a common objective. We find that the technique achieves competitive results on a variety of data partitions compared to state-of-the-art approaches.
arXiv Detail & Related papers (2023-11-08T16:42:14Z)
Fantastic Gains and Where to Find Them: On the Existence and Prospect of General Knowledge Transfer between Any Pretrained Model [74.62272538148245]
We show that for arbitrary pairings of pretrained models, one model extracts significant data context unavailable in the other. We investigate if it is possible to transfer such "complementary" knowledge from one model to another without performance degradation.
arXiv Detail & Related papers (2023-10-26T17:59:46Z)
Federated Learning of Models Pre-Trained on Different Features with Consensus Graphs [19.130197923214123]
Learning an effective global model on private and decentralized datasets has become an increasingly important challenge of machine learning. We propose a feature fusion approach that extracts local representations from local models and incorporates them into a global representation that improves the prediction performance. This paper presents solutions to these problems and demonstrates them in real-world applications on time series data such as power grids and traffic networks.
arXiv Detail & Related papers (2023-06-02T02:24:27Z)
Personalizing Federated Learning with Over-the-Air Computations [84.8089761800994]
Federated edge learning is a promising technology to deploy intelligence at the edge of wireless networks in a privacy-preserving manner. Under such a setting, multiple clients collaboratively train a global generic model under the coordination of an edge server. This paper presents a distributed training paradigm that employs analog over-the-air computation to address the communication bottleneck.
arXiv Detail & Related papers (2023-02-24T08:41:19Z)
Federated Gradient Matching Pursuit [17.695717854068715]
Traditional machine learning techniques require centralizing all training data on one server or data hub. In particular, federated learning (FL) provides such a solution to learn a shared model while keeping training data at local clients. We propose a novel algorithmic framework, federated gradient matching pursuit (FedGradMP), to solve the sparsity constrained minimization problem in the FL setting.
arXiv Detail & Related papers (2023-02-20T16:26:29Z)
Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models. This creates a barrier to fusing knowledge across individual models to yield a better single model. We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z)
Scalable Collaborative Learning via Representation Sharing [53.047460465980144]
Federated learning (FL) and Split Learning (SL) are two frameworks that enable collaborative learning while keeping the data private (on device) In FL, each data holder trains a model locally and releases it to a central server for aggregation. In SL, the clients must release individual cut-layer activations (smashed data) to the server and wait for its response (during both inference and back propagation). In this work, we present a novel approach for privacy-preserving machine learning, where the clients collaborate via online knowledge distillation using a contrastive loss.
arXiv Detail & Related papers (2022-11-20T10:49:22Z)
Federated Pruning: Improving Neural Network Efficiency with Federated Learning [24.36174705715827]
We propose Federated Pruning to train a reduced model under the federated setting. We explore different pruning schemes and provide empirical evidence of the effectiveness of our methods.
arXiv Detail & Related papers (2022-09-14T00:48:37Z)
Multi-Branch Deep Radial Basis Function Networks for Facial Emotion Recognition [80.35852245488043]
We propose a CNN based architecture enhanced with multiple branches formed by radial basis function (RBF) units. RBF units capture local patterns shared by similar instances using an intermediate representation. We show it is the incorporation of local information what makes the proposed model competitive.
arXiv Detail & Related papers (2021-09-07T21:05:56Z)
FedKD: Communication Efficient Federated Learning via Knowledge Distillation [56.886414139084216]
Federated learning is widely used to learn intelligent models from decentralized data. In federated learning, clients need to communicate their local model updates in each iteration of model learning. We propose a communication efficient federated learning method based on knowledge distillation.
arXiv Detail & Related papers (2021-08-30T15:39:54Z)
Federated Action Recognition on Heterogeneous Embedded Devices [16.88104153104136]
In this work, we enable clients with limited computing power to perform action recognition, a computationally heavy task. We first perform model compression at the central server through knowledge distillation on a large dataset. The fine-tuning is required because limited data present in smaller datasets is not adequate for action recognition models to learn complextemporal features.
arXiv Detail & Related papers (2021-07-18T02:33:24Z)
Model-Based Deep Learning [155.063817656602]
Signal processing, communications, and control have traditionally relied on classical statistical modeling techniques. Deep neural networks (DNNs) use generic architectures which learn to operate from data, and demonstrate excellent performance. We are interested in hybrid techniques that combine principled mathematical models with data-driven systems to benefit from the advantages of both approaches.
arXiv Detail & Related papers (2020-12-15T16:29:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.