OnDev-LCT: On-Device Lightweight Convolutional Transformers towards
federated learning
- URL: http://arxiv.org/abs/2401.11652v1
- Date: Mon, 22 Jan 2024 02:17:36 GMT
- Title: OnDev-LCT: On-Device Lightweight Convolutional Transformers towards
federated learning
- Authors: Chu Myaet Thwal, Minh N.H. Nguyen, Ye Lin Tun, Seong Tae Kim, My T.
Thai, Choong Seon Hong
- Abstract summary: Federated learning (FL) has emerged as a promising approach to collaboratively train machine learning models across multiple edge devices.
We propose OnDev-LCT: Lightweight Convolutional Transformers for On-Device vision tasks with limited training data and resources.
- Score: 29.798780069556074
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Federated learning (FL) has emerged as a promising approach to
collaboratively train machine learning models across multiple edge devices
while preserving privacy. The success of FL hinges on the efficiency of
participating models and their ability to handle the unique challenges of
distributed learning. While several variants of Vision Transformer (ViT) have
shown great potential as alternatives to modern convolutional neural networks
(CNNs) for centralized training, the unprecedented size and higher
computational demands hinder their deployment on resource-constrained edge
devices, challenging their widespread application in FL. Since client devices
in FL typically have limited computing resources and communication bandwidth,
models intended for such devices must strike a balance between model size,
computational efficiency, and the ability to adapt to the diverse and non-IID
data distributions encountered in FL. To address these challenges, we propose
OnDev-LCT: Lightweight Convolutional Transformers for On-Device vision tasks
with limited training data and resources. Our models incorporate image-specific
inductive biases through the LCT tokenizer by leveraging efficient depthwise
separable convolutions in residual linear bottleneck blocks to extract local
features, while the multi-head self-attention (MHSA) mechanism in the LCT
encoder implicitly facilitates capturing global representations of images.
Extensive experiments on benchmark image datasets indicate that our models
outperform existing lightweight vision models while having fewer parameters and
lower computational demands, making them suitable for FL scenarios with data
heterogeneity and communication bottlenecks.
Related papers
- Heterogeneous Federated Learning with Splited Language Model [22.65325348176366]
Federated Split Learning (FSL) is a promising distributed learning paradigm in practice.
In this paper, we harness Pre-trained Image Transformers (PITs) as the initial model, coined FedV, to accelerate the training process and improve model robustness.
We are the first to provide a systematic evaluation of FSL methods with PITs in real-world datasets, different partial device participations, and heterogeneous data splits.
arXiv Detail & Related papers (2024-03-24T07:33:08Z) - Efficient Language Model Architectures for Differentially Private
Federated Learning [21.280600854272716]
Cross-device federated learning (FL) is a technique that trains a model on data distributed across typically millions of edge devices without data leaving the devices.
In centralized training of language models, adaptives are preferred as they offer improved stability and performance.
We propose a scale-in Coupled Input Forget Gate (SI CIFG) recurrent network by modifying the sigmoid and tanh activations in neural recurrent cell.
arXiv Detail & Related papers (2024-03-12T22:21:48Z) - Diffusion-Based Neural Network Weights Generation [80.89706112736353]
D2NWG is a diffusion-based neural network weights generation technique that efficiently produces high-performing weights for transfer learning.
Our method extends generative hyper-representation learning to recast the latent diffusion paradigm for neural network weights generation.
Our approach is scalable to large architectures such as large language models (LLMs), overcoming the limitations of current parameter generation techniques.
arXiv Detail & Related papers (2024-02-28T08:34:23Z) - AdapterFL: Adaptive Heterogeneous Federated Learning for
Resource-constrained Mobile Computing Systems [24.013937378054074]
Federated Learning (FL) enables collaborative learning of large-scale distributed clients without data sharing.
Mobile computing systems can only use small low-performance models for collaborative learning.
We use a model reassemble strategy to facilitate collaborative training of massive heterogeneous mobile devices adaptively.
arXiv Detail & Related papers (2023-11-23T14:42:43Z) - Adaptive Model Pruning and Personalization for Federated Learning over
Wireless Networks [72.59891661768177]
Federated learning (FL) enables distributed learning across edge devices while protecting data privacy.
We consider a FL framework with partial model pruning and personalization to overcome these challenges.
This framework splits the learning model into a global part with model pruning shared with all devices to learn data representations and a personalized part to be fine-tuned for a specific device.
arXiv Detail & Related papers (2023-09-04T21:10:45Z) - Analysis and Optimization of Wireless Federated Learning with Data
Heterogeneity [72.85248553787538]
This paper focuses on performance analysis and optimization for wireless FL, considering data heterogeneity, combined with wireless resource allocation.
We formulate the loss function minimization problem, under constraints on long-term energy consumption and latency, and jointly optimize client scheduling, resource allocation, and the number of local training epochs (CRE)
Experiments on real-world datasets demonstrate that the proposed algorithm outperforms other benchmarks in terms of the learning accuracy and energy consumption.
arXiv Detail & Related papers (2023-08-04T04:18:01Z) - Vertical Federated Learning over Cloud-RAN: Convergence Analysis and
System Optimization [82.12796238714589]
We propose a novel cloud radio access network (Cloud-RAN) based vertical FL system to enable fast and accurate model aggregation.
We characterize the convergence behavior of the vertical FL algorithm considering both uplink and downlink transmissions.
We establish a system optimization framework by joint transceiver and fronthaul quantization design, for which successive convex approximation and alternate convex search based system optimization algorithms are developed.
arXiv Detail & Related papers (2023-05-04T09:26:03Z) - Parallel Successive Learning for Dynamic Distributed Model Training over
Heterogeneous Wireless Networks [50.68446003616802]
Federated learning (FedL) has emerged as a popular technique for distributing model training over a set of wireless devices.
We develop parallel successive learning (PSL), which expands the FedL architecture along three dimensions.
Our analysis sheds light on the notion of cold vs. warmed up models, and model inertia in distributed machine learning.
arXiv Detail & Related papers (2022-02-07T05:11:01Z) - Joint Superposition Coding and Training for Federated Learning over
Multi-Width Neural Networks [52.93232352968347]
This paper aims to integrate two synergetic technologies, federated learning (FL) and width-adjustable slimmable neural network (SNN)
FL preserves data privacy by exchanging the locally trained models of mobile devices. SNNs are however non-trivial, particularly under wireless connections with time-varying channel conditions.
We propose a communication and energy-efficient SNN-based FL (named SlimFL) that jointly utilizes superposition coding (SC) for global model aggregation and superposition training (ST) for updating local models.
arXiv Detail & Related papers (2021-12-05T11:17:17Z) - Reconfigurable Intelligent Surface Enabled Federated Learning: A Unified
Communication-Learning Design Approach [30.1988598440727]
We develop a unified communication-learning optimization problem to jointly optimize device selection, over-the-air transceiver design, and RIS configuration.
Numerical experiments show that the proposed design achieves substantial learning accuracy improvement compared with the state-of-the-art approaches.
arXiv Detail & Related papers (2020-11-20T08:54:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.