Related papers: Hierarchical Federated Learning with SignSGD: A Highly Communication-Efficient Approach

Hierarchical Federated Learning with SignSGD: A Highly Communication-Efficient Approach

URL: http://arxiv.org/abs/2602.02355v1
Date: Mon, 02 Feb 2026 17:18:03 GMT
Title: Hierarchical Federated Learning with SignSGD: A Highly Communication-Efficient Approach
Authors: Amirreza Kazemi, Seyed Mohammad Azimi-Abarghouyi, Gabor Fodor, Carlo Fischione,
Abstract summary: Hierarchical edge learning (HFL) has emerged as a key for large-scale wireless and Internet of Things systems.<n>One method such as sign-based gradient descent (SignSGD) offer an essential solution, but existing theory and algorithms do not naturally extend to hierarchical settings.<n>We introduce a scalable HFL algorithm, HierSignSGD, and provide the convergence analysis for SignSGD in a hierarchical setting.
Score: 16.51305515824504
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Hierarchical federated learning (HFL) has emerged as a key architecture for large-scale wireless and Internet of Things systems, where devices communicate with nearby edge servers before reaching the cloud. In these environments, uplink bandwidth and latency impose strict communication limits, thereby making aggressive gradient compression essential. One-bit methods such as sign-based stochastic gradient descent (SignSGD) offer an attractive solution in flat federated settings, but existing theory and algorithms do not naturally extend to hierarchical settings. In particular, the interaction between majority-vote aggregation at the edge layer and model aggregation at the cloud layer, and its impact on end-to-end performance, remains unknown. To bridge this gap, we propose a highly communication-efficient sign-based HFL framework and develop its corresponding formulation for nonconvex learning, where devices send only signed stochastic gradients, edge servers combine them through majority-vote, and the cloud periodically averages the obtained edge models, while utilizing downlink quantization to broadcast the global model. We introduce the resulting scalable HFL algorithm, HierSignSGD, and provide the convergence analysis for SignSGD in a hierarchical setting. Our core technical contribution is a characterization of how biased sign compression, two-level aggregation intervals, and inter-cluster heterogeneity collectively affect convergence. Numerical experiments under homogeneous and heterogeneous data splits show that HierSignSGD, despite employing extreme compression, achieves accuracy comparable to or better than full-precision stochastic gradient descent while reducing communication cost in the process, and remains robust under aggressive downlink sparsification.

Related papers

CoCo-Fed: A Unified Framework for Memory- and Communication-Efficient Federated Learning at the Wireless Edge [50.42067935605982]
We propose a novel Compression and Combination-based Federated learning framework that unifies local memory efficiency and global communication reduction.<n>CoCo-Fed significantly outperforms state-of-the-art baselines in both memory and communication efficiency while maintaining robust convergence under non-IID settings.
arXiv Detail & Related papers (2026-01-02T03:39:50Z)
Decentralized Nonconvex Composite Federated Learning with Gradient Tracking and Momentum [78.27945336558987]
Decentralized server (DFL) eliminates reliance on client-client architecture.<n>Non-smooth regularization is often incorporated into machine learning tasks.<n>We propose a novel novel DNCFL algorithm to solve these problems.
arXiv Detail & Related papers (2025-04-17T08:32:25Z)
Hierarchical Federated Learning with Multi-Timescale Gradient Correction [24.713834338757195]
In this paper, we propose a multi-time correction (MTGC) methodology to resolve this issue.<n>Our key idea is to introduce distinct control to (i) correct the client gradient the group gradient, i.e., to reduce client model drift caused by local updates based on individual datasets.
arXiv Detail & Related papers (2024-09-27T05:10:05Z)
Sequential Federated Learning in Hierarchical Architecture on Non-IID Datasets [25.010661914466354]
In a real federated learning (FL) system, communication overhead for passing model parameters between the clients and the parameter (PS) is often a bottleneck. We propose sequential FL (SFL) HFL for the first time, which removes the central PS and enables the model to be completed only through passing data between two adjacent ESs for each server.
arXiv Detail & Related papers (2024-08-19T07:43:35Z)
GIFD: A Generative Gradient Inversion Method with Feature Domain Optimization [52.55628139825667]
Federated Learning (FL) has emerged as a promising distributed machine learning framework to preserve clients' privacy. Recent studies find that an attacker can invert the shared gradients and recover sensitive data against an FL system by leveraging pre-trained generative adversarial networks (GAN) as prior knowledge. We propose textbfGradient textbfInversion over textbfFeature textbfDomains (GIFD), which disassembles the GAN model and searches the feature domains of the intermediate layers.
arXiv Detail & Related papers (2023-08-09T04:34:21Z)
Adaptive Hierarchical SpatioTemporal Network for Traffic Forecasting [70.66710698485745]
We propose an Adaptive Hierarchical SpatioTemporal Network (AHSTN) to promote traffic forecasting. AHSTN exploits the spatial hierarchy and modeling multi-scale spatial correlations. Experiments on two real-world datasets show that AHSTN achieves better performance over several strong baselines.
arXiv Detail & Related papers (2023-06-15T14:50:27Z)
Adaptive Federated Pruning in Hierarchical Wireless Networks [69.6417645730093]
Federated Learning (FL) is a privacy-preserving distributed learning framework where a server aggregates models updated by multiple devices without accessing their private datasets. In this paper, we introduce model pruning for HFL in wireless networks to reduce the neural network scale. We show that our proposed HFL with model pruning achieves similar learning accuracy compared with the HFL without model pruning and reduces about 50 percent communication cost.
arXiv Detail & Related papers (2023-05-15T22:04:49Z)
$z$-SignFedAvg: A Unified Stochastic Sign-based Compression for Federated Learning [14.363110221372274]
Federated Learning (FL) is a promising privacy-preserving distributed learning paradigm. FL suffers from high communication cost when training large-scale machine learning models. We propose a novel noisy perturbation scheme with a general symmetric noise distribution for sign-based compression.
arXiv Detail & Related papers (2023-02-06T06:54:49Z)
Communication-Efficient Distributed SGD with Compressed Sensing [24.33697801661053]
We consider large scale distributed optimization over a set of edge devices connected to a central server. Inspired by recent advances in federated learning, we propose a distributed gradient descent (SGD) type algorithm that exploits the sparsity of the gradient, when possible, to reduce communication burden. We conduct theoretical analysis on the convergence of our algorithm in the presence of noise perturbation incurred by the communication channels, and also conduct numerical experiments to corroborate its effectiveness.
arXiv Detail & Related papers (2021-12-15T02:10:45Z)
Communication-Efficient Federated Learning via Quantized Compressed Sensing [82.10695943017907]
The presented framework consists of gradient compression for wireless devices and gradient reconstruction for a parameter server. Thanks to gradient sparsification and quantization, our strategy can achieve a higher compression ratio than one-bit gradient compression. We demonstrate that the framework achieves almost identical performance with the case that performs no compression.
arXiv Detail & Related papers (2021-11-30T02:13:54Z)
Accelerated Gradient Descent Learning over Multiple Access Fading Channels [9.840290491547162]
We consider a distributed learning problem in a wireless network, consisting of N distributed edge devices and a parameter server (PS) We develop a novel Accelerated Gradient-descent Multiple Access (AGMA) algorithm that uses momentum-based gradient signals over noisy fading MAC to improve the convergence rate as compared to existing methods.
arXiv Detail & Related papers (2021-07-26T19:51:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.