Related papers: On Using Large-Batches in Federated Learning

On Using Large-Batches in Federated Learning

URL: http://arxiv.org/abs/2509.10537v1
Date: Fri, 05 Sep 2025 17:31:50 GMT
Title: On Using Large-Batches in Federated Learning
Authors: Sahil Tyagi,
Abstract summary: Federated learning is crucial for training deep networks over devices with limited compute resources and bounded networks.<n>This work proposes our vision of exploiting the trade-offs between small and large-batch training.<n>For the same number of iterations, we observe that our proposed large-batch training technique attains about 32.33% and 3.74% higher test accuracy than small-batch training.
Score: 1.5229257192293202
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Efficient Federated learning (FL) is crucial for training deep networks over devices with limited compute resources and bounded networks. With the advent of big data, devices either generate or collect multimodal data to train either generic or local-context aware networks, particularly when data privacy and locality is vital. FL algorithms generally trade-off between parallel and statistical performance, improving model quality at the cost of higher communication frequency, or vice versa. Under frequent synchronization settings, FL over a large cluster of devices may perform more work per-training iteration by processing a larger global batch-size, thus attaining considerable training speedup. However, this may result in poor test performance (i.e., low test loss or accuracy) due to generalization degradation issues associated with large-batch training. To address these challenges with large-batches, this work proposes our vision of exploiting the trade-offs between small and large-batch training, and explore new directions to enjoy both the parallel scaling of large-batches and good generalizability of small-batch training. For the same number of iterations, we observe that our proposed large-batch training technique attains about 32.33% and 3.74% higher test accuracy than small-batch training in ResNet50 and VGG11 models respectively.

Related papers

Prior-Fitted Networks Scale to Larger Datasets When Treated as Weak Learners [82.72552644267724]
BoostPFN can outperform standard PFNs with the same size of training samples in large datasets.<n>High performance is maintained for up to 50x of the pre-training size of PFNs.
arXiv Detail & Related papers (2025-03-03T07:31:40Z)
AB-Training: A Communication-Efficient Approach for Distributed Low-Rank Learning [0.07227323884094951]
AB-training is a novel data-parallel method that leverages low-rank representations and independent training groups to reduce communication overhead. Our experiments demonstrate an average reduction in network traffic of approximately 70.31% across various scaling scenarios.
arXiv Detail & Related papers (2024-05-02T07:49:28Z)
Efficient Asynchronous Federated Learning with Sparsification and Quantization [55.6801207905772]
Federated Learning (FL) is attracting more and more attention to collaboratively train a machine learning model without transferring raw data. FL generally exploits a parameter server and a large number of edge devices during the whole process of the model training. We propose TEASQ-Fed to exploit edge devices to asynchronously participate in the training process by actively applying for tasks.
arXiv Detail & Related papers (2023-12-23T07:47:07Z)
Analysis and Optimization of Wireless Federated Learning with Data Heterogeneity [72.85248553787538]
This paper focuses on performance analysis and optimization for wireless FL, considering data heterogeneity, combined with wireless resource allocation. We formulate the loss function minimization problem, under constraints on long-term energy consumption and latency, and jointly optimize client scheduling, resource allocation, and the number of local training epochs (CRE) Experiments on real-world datasets demonstrate that the proposed algorithm outperforms other benchmarks in terms of the learning accuracy and energy consumption.
arXiv Detail & Related papers (2023-08-04T04:18:01Z)
SemiSFL: Split Federated Learning on Unlabeled and Non-IID Data [34.49090830845118]
Federated Learning (FL) has emerged to allow multiple clients to collaboratively train machine learning models on their private data at the network edge. We propose a novel Semi-supervised SFL system, termed SemiSFL, which incorporates clustering regularization to perform SFL with unlabeled and non-IID client data. Our system provides a 3.8x speed-up in training time, reduces the communication cost by about 70.3% while reaching the target accuracy, and achieves up to 5.8% improvement in accuracy under non-IID scenarios.
arXiv Detail & Related papers (2023-07-29T02:35:37Z)
Q-Ensemble for Offline RL: Don't Scale the Ensemble, Scale the Batch Size [58.762959061522736]
We show that scaling mini-batch sizes with appropriate learning rate adjustments can speed up the training process by orders of magnitude. We show that scaling the mini-batch size and naively adjusting the learning rate allows for (1) a reduced size of the Q-ensemble, (2) stronger penalization of out-of-distribution actions, and (3) improved convergence time.
arXiv Detail & Related papers (2022-11-20T21:48:25Z)
ZeroFL: Efficient On-Device Training for Federated Learning with Local Sparsity [15.908499928588297]
In Federated Learning (FL), nodes are orders of magnitude more constrained than traditional server-grade hardware. We propose ZeroFL, a framework that relies on highly sparse operations to accelerate on-device training.
arXiv Detail & Related papers (2022-08-04T07:37:07Z)
Efficient Augmentation for Imbalanced Deep Learning [8.38844520504124]
We study a convolutional neural network's internal representation of imbalanced image data. We measure the generalization gap between a model's feature embeddings in the training and test sets, showing that the gap is wider for minority classes. This insight enables us to design an efficient three-phase CNN training framework for imbalanced data.
arXiv Detail & Related papers (2022-07-13T09:43:17Z)
Efficient Device Scheduling with Multi-Job Federated Learning [64.21733164243781]
We propose a novel multi-job Federated Learning framework to enable the parallel training process of multiple jobs. We propose a reinforcement learning-based method and a Bayesian optimization-based method to schedule devices for multiple jobs while minimizing the cost. Our proposed approaches significantly outperform baseline approaches in terms of training time (up to 8.67 times faster) and accuracy (up to 44.6% higher)
arXiv Detail & Related papers (2021-12-11T08:05:11Z)
Collaborative Learning over Wireless Networks: An Introductory Overview [84.09366153693361]
We will mainly focus on collaborative training across wireless devices. Many distributed optimization algorithms have been developed over the last decades. They provide data locality; that is, a joint model can be trained collaboratively while the data available at each participating device remains local.
arXiv Detail & Related papers (2021-12-07T20:15:39Z)
Data optimization for large batch distributed training of deep neural networks [0.19336815376402716]
Current practice for distributed training of deep neural networks faces the challenges of communication bottlenecks when operating at scale. We propose a data optimization approach that utilize machine learning to implicitly smooth out the loss landscape resulting in fewer local minima. Our approach filters out data points which are less important to feature learning, enabling us to speed up the training of models on larger batch sizes to improved accuracy.
arXiv Detail & Related papers (2020-12-16T21:22:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.