Warming Up for Zeroth-Order Federated Pre-Training with Low Resource Clients
- URL: http://arxiv.org/abs/2509.03503v1
- Date: Wed, 03 Sep 2025 17:35:51 GMT
- Title: Warming Up for Zeroth-Order Federated Pre-Training with Low Resource Clients
- Authors: Gwen Legate, Irina Rish, Eugene Belilovsky,
- Abstract summary: Federated learning enables collaborative model training across numerous edge devices without requiring participants to share data.<n>We consider a setting in which a subset of edge devices are below a critical memory or communication threshold required to conduct model updates.<n>We are inspired by MeZO, a zeroth-order method used for memory-efficient fine-tuning.<n>We present experiments using various datasets and model architectures to show that ZOWarmUp is a robust algorithm that can be applied.
- Score: 29.247322281710115
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Federated learning enables collaborative model training across numerous edge devices without requiring participants to share data; however, memory and communication constraints on these edge devices may preclude their participation in training. We consider a setting in which a subset of edge devices are below a critical memory or communication threshold required to conduct model updates. Under typical federated optimization algorithms, these devices are excluded from training which renders their data inaccessible and increases system induced bias. We are inspired by MeZO, a zeroth-order method used for memory-efficient fine-tuning. The increased variance inherent to zeroth-order gradient approximations has relegated previous zeroth-order optimizers exclusively to the domain of fine tuning; a limitation we seek to correct. We devise a federated, memory-efficient zeroth-order optimizer, ZOWarmUp that permits zeroth-order training from a random initialization. ZOWarmUp leverages differing client capabilities and careful variance reduction techniques to facilitate participation of under-represented, low-resource clients in model training. Like other federated zeroth-order methods, ZOWarmUp eliminates the need for edge devices to transmit their full gradients to the server and instead relies on only a small set of random seeds, rendering the up-link communication cost negligible. We present experiments using various datasets and model architectures to show that ZOWarmUp is a robust algorithm that can can be applied under a wide variety of circumstances. For systems with a high proportion of edge devices that would otherwise be excluded from training, this algorithm provides access to a greater volume and diversity of data, thus improving training outcomes.
Related papers
- Elastic ViTs from Pretrained Models without Retraining [74.5386166956142]
Vision foundation models achieve remarkable performance but are only available in a limited set of pre-determined sizes.<n>We introduce SnapViT: Single-shot network approximation for pruned Vision Transformers.<n>Our approach efficiently combines gradient information with cross-network structure correlations, approximated via an evolutionary algorithm.
arXiv Detail & Related papers (2025-10-20T16:15:03Z) - Optimizing Model Splitting and Device Task Assignment for Deceptive Signal Assisted Private Multi-hop Split Learning [58.620753467152376]
In our model, several edge devices jointly perform collaborative training, and some eavesdroppers aim to collect the model and data information from devices.<n>To prevent the eavesdroppers from collecting model and data information, a subset of devices can transmit deceptive signals.<n>We propose a soft actor-critic deep reinforcement learning framework with intrinsic curiosity module and cross-attention.
arXiv Detail & Related papers (2025-07-09T22:53:23Z) - Efficient Zero-Order Federated Finetuning of Language Models for Resource-Constrained Devices [11.523328603690945]
Fine-tuning Large Language Models (LLMs) on edge devices remains challenging due to high memory, communication, and computational demands.<n>We propose Federated Split-Perturbation Zero-order Optimization (FedSPZO) that divides the network into two blocks, applying a different number of perturbations per block.<n>Our evaluation shows a $2.5 - 7times $ reduction in computation overhead compared to zero-order state of the art techniques in federated learning.
arXiv Detail & Related papers (2025-02-14T15:49:02Z) - Communication Efficient ConFederated Learning: An Event-Triggered SAGA
Approach [67.27031215756121]
Federated learning (FL) is a machine learning paradigm that targets model training without gathering the local data over various data sources.
Standard FL, which employs a single server, can only support a limited number of users, leading to degraded learning capability.
In this work, we consider a multi-server FL framework, referred to as emphConfederated Learning (CFL) in order to accommodate a larger number of users.
arXiv Detail & Related papers (2024-02-28T03:27:10Z) - Rendering Wireless Environments Useful for Gradient Estimators: A Zero-Order Stochastic Federated Learning Method [14.986031916712108]
Cross-device federated learning (FL) is a growing machine learning framework whereby multiple edge devices collaborate to train a model without disclosing their raw data.<n>We show how to harness the wireless channel in the learning algorithm itself instead of to analyze it remove its impact.
arXiv Detail & Related papers (2024-01-30T21:46:09Z) - Just One Byte (per gradient): A Note on Low-Bandwidth Decentralized
Language Model Finetuning Using Shared Randomness [86.61582747039053]
Language model training in distributed settings is limited by the communication cost of exchanges.
We extend recent work using shared randomness to perform distributed fine-tuning with low bandwidth.
arXiv Detail & Related papers (2023-06-16T17:59:51Z) - Stochastic Coded Federated Learning: Theoretical Analysis and Incentive
Mechanism Design [18.675244280002428]
We propose a novel FL framework named coded federated learning (SCFL) that leverages coded computing techniques.
In SCFL, each edge device uploads a privacy-preserving coded dataset to the server, which is generated by adding noise to the projected local dataset.
We show that SCFL learns a better model within the given time and achieves a better privacy-performance tradeoff than the baseline methods.
arXiv Detail & Related papers (2022-11-08T09:58:36Z) - On-Device Training Under 256KB Memory [62.95579393237751]
We propose an algorithm-system co-design framework to make on-device training possible with only 256KB of memory.
Our framework is the first solution to enable tiny on-device training of convolutional neural networks under 256KB and 1MB Flash.
arXiv Detail & Related papers (2022-06-30T17:59:08Z) - Boosting Resource-Constrained Federated Learning Systems with Guessed Updates [1.6053176639259055]
GEL enables constrained edge devices to perform additional learning through guessed updates on top of gradient-based steps.<n>GEL can boost empirical convergence by up to 40% in resource constrained networks.
arXiv Detail & Related papers (2021-10-21T21:23:04Z) - ProgFed: Effective, Communication, and Computation Efficient Federated Learning by Progressive Training [65.68511423300812]
We propose ProgFed, a progressive training framework for efficient and effective federated learning.
ProgFed inherently reduces computation and two-way communication costs while maintaining the strong performance of the final models.
Our results show that ProgFed converges at the same rate as standard training on full models.
arXiv Detail & Related papers (2021-10-11T14:45:00Z) - Federated learning with class imbalance reduction [24.044750119251308]
Federated learning (FL) is a technique that enables a large amount of edge computing devices to collaboratively train a global learning model.
Due to privacy concerns, the raw data on devices could not be available for centralized server.
In this paper, an estimation scheme is designed to reveal the class distribution without the awareness of raw data.
arXiv Detail & Related papers (2020-11-23T08:13:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.