Non-Coherent Over-the-Air Decentralized Gradient Descent
- URL: http://arxiv.org/abs/2211.10777v4
- Date: Wed, 11 Sep 2024 23:16:05 GMT
- Title: Non-Coherent Over-the-Air Decentralized Gradient Descent
- Authors: Nicolo' Michelusi,
- Abstract summary: Implementing Decentralized Gradient Descent in wireless systems is challenging due to noise, fading, and limited bandwidth.
This paper introduces a scalable DGD algorithm that eliminates the need for scheduling, topology information, or CSI.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Implementing Decentralized Gradient Descent (DGD) in wireless systems is challenging due to noise, fading, and limited bandwidth, necessitating topology awareness, transmission scheduling, and the acquisition of channel state information (CSI) to mitigate interference and maintain reliable communications. These operations may result in substantial signaling overhead and scalability challenges in large networks lacking central coordination. This paper introduces a scalable DGD algorithm that eliminates the need for scheduling, topology information, or CSI (both average and instantaneous). At its core is a Non-Coherent Over-The-Air (NCOTA) consensus scheme that exploits a noisy energy superposition property of wireless channels. Nodes encode their local optimization signals into energy levels within an OFDM frame and transmit simultaneously, without coordination. The key insight is that the received energy equals, on average, the sum of the energies of the transmitted signals, scaled by their respective average channel gains, akin to a consensus step. This property enables unbiased consensus estimation, utilizing average channel gains as mixing weights, thereby removing the need for their explicit design or for CSI. Introducing a consensus stepsize mitigates consensus estimation errors due to energy fluctuations around their expected values. For strongly-convex problems, it is shown that the expected squared distance between the local and globally optimum models vanishes at a rate of O(1/sqrt{k}) after k iterations, with suitable decreasing learning and consensus stepsizes. Extensions accommodate a broad class of fading models and frequency-selective channels. Numerical experiments on image classification demonstrate faster convergence in terms of running time compared to state-of-the-art schemes, especially in dense network scenarios.
Related papers
- Decentralized Optimization in Time-Varying Networks with Arbitrary Delays [22.40154714677385]
We consider a decentralized optimization problem for networks affected by communication delays.
Examples of such networks include collaborative machine learning, sensor networks, and multi-agent systems.
To mimic communication delays, we add virtual non-computing nodes to the network, resulting in directed graphs.
arXiv Detail & Related papers (2024-05-29T20:51:38Z) - TBSN: Transformer-Based Blind-Spot Network for Self-Supervised Image Denoising [94.09442506816724]
Blind-spot networks (BSN) have been prevalent network architectures in self-supervised image denoising (SSID)
We present a transformer-based blind-spot network (TBSN) by analyzing and redesigning the transformer operators that meet the blind-spot requirement.
For spatial self-attention, an elaborate mask is applied to the attention matrix to restrict its receptive field, thus mimicking the dilated convolution.
For channel self-attention, we observe that it may leak the blind-spot information when the channel number is greater than spatial size in the deep layers of multi-scale architectures.
arXiv Detail & Related papers (2024-04-11T15:39:10Z) - SINR-Aware Deep Reinforcement Learning for Distributed Dynamic Channel
Allocation in Cognitive Interference Networks [10.514231683620517]
This paper focuses on real-world systems experiencing inter-carrier interference (ICI) and channel reuse by multiple large-scale networks.
We propose a novel multi-agent reinforcement learning framework for distributed DCA, named Channel Allocation RL To Overlapped Networks (CARLTON)
Our results demonstrate exceptional performance and robust generalization, showcasing superior efficiency compared to alternative state-of-the-art methods.
arXiv Detail & Related papers (2024-02-17T20:03:02Z) - Decentralized Learning over Wireless Networks with Broadcast-Based
Subgraph Sampling [36.99249604183772]
This work centers on the communication aspects of decentralized learning over wireless networks, using consensus-based decentralized descent (D-SGD)
Considering the actual communication cost or delay caused by in-network information exchange in an iterative process, our goal is to achieve fast convergence of the algorithm measured by improvement per transmission slot.
We propose BASS, an efficient communication framework for D-SGD over wireless networks with broadcast transmission and probabilistic subgraph sampling.
arXiv Detail & Related papers (2023-10-24T18:15:52Z) - Over-the-Air Federated Learning and Optimization [52.5188988624998]
We focus on Federated learning (FL) via edge-the-air computation (AirComp)
We describe the convergence of AirComp-based FedAvg (AirFedAvg) algorithms under both convex and non- convex settings.
For different types of local updates that can be transmitted by edge devices (i.e., model, gradient, model difference), we reveal that transmitting in AirFedAvg may cause an aggregation error.
In addition, we consider more practical signal processing schemes to improve the communication efficiency and extend the convergence analysis to different forms of model aggregation error caused by these signal processing schemes.
arXiv Detail & Related papers (2023-10-16T05:49:28Z) - Bandwidth-efficient distributed neural network architectures with
application to body sensor networks [73.02174868813475]
This paper describes a conceptual design methodology to design distributed neural network architectures.
We show that the proposed framework enables up to a factor 20 in bandwidth reduction with minimal loss.
While the application focus of this paper is on wearable brain-computer interfaces, the proposed methodology can be applied in other sensor network-like applications as well.
arXiv Detail & Related papers (2022-10-14T12:35:32Z) - Over-the-Air Decentralized Federated Learning [28.593149477080605]
We consider decentralized federated learning (FL) over wireless networks, where over-the-air computation (AirComp) is adopted to facilitate the local model consensus in a device-to-device (D2D) communication manner.
We propose an AirComp-based DSGD with gradient tracking and variance reduction (DSGT-VR) algorithm, where both precoding and decoding strategies are developed for D2D communication.
We prove that the proposed algorithm converges linearly and establish the optimality gap for strongly convex and smooth loss functions, taking into account the channel fading and noise.
arXiv Detail & Related papers (2021-06-15T09:42:33Z) - Energy-Efficient Model Compression and Splitting for Collaborative
Inference Over Time-Varying Channels [52.60092598312894]
We propose a technique to reduce the total energy bill at the edge device by utilizing model compression and time-varying model split between the edge and remote nodes.
Our proposed solution results in minimal energy consumption and $CO$ emission compared to the considered baselines.
arXiv Detail & Related papers (2021-06-02T07:36:27Z) - Decentralized Learning for Channel Allocation in IoT Networks over
Unlicensed Bandwidth as a Contextual Multi-player Multi-armed Bandit Game [134.88020946767404]
We study a decentralized channel allocation problem in an ad-hoc Internet of Things network underlaying on the spectrum licensed to a primary cellular network.
Our study maps this problem into a contextual multi-player, multi-armed bandit game, and proposes a purely decentralized, three-stage policy learning algorithm through trial-and-error.
arXiv Detail & Related papers (2020-03-30T10:05:35Z) - A Compressive Sensing Approach for Federated Learning over Massive MIMO
Communication Systems [82.2513703281725]
Federated learning is a privacy-preserving approach to train a global model at a central server by collaborating with wireless devices.
We present a compressive sensing approach for federated learning over massive multiple-input multiple-output communication systems.
arXiv Detail & Related papers (2020-03-18T05:56:27Z) - Decentralized SGD with Over-the-Air Computation [13.159777131162961]
We study the performance of decentralized numerically gradient descent (DSGD) in a wireless network.
We assume that transmissions are prone to additive noise and interference.
We show that the OAC-MAC scheme attains better convergence performance with a fewer communication rounds.
arXiv Detail & Related papers (2020-03-06T15:33:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.