Near Optimal Decentralized Optimization with Compression and Momentum Tracking
- URL: http://arxiv.org/abs/2405.20114v1
- Date: Thu, 30 May 2024 14:51:57 GMT
- Title: Near Optimal Decentralized Optimization with Compression and Momentum Tracking
- Authors: Rustem Islamov, Yuan Gao, Sebastian U. Stich,
- Abstract summary: We introduce MoTEF, a novel approach that integrates communication with Momentum Tracking and Error Feedback.
Our analysis demonstrates that MoTEF integrates most of the desired properties, and significantly existing methods under data.
- Score: 27.484212303346816
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Communication efficiency has garnered significant attention as it is considered the main bottleneck for large-scale decentralized Machine Learning applications in distributed and federated settings. In this regime, clients are restricted to transmitting small amounts of quantized information to their neighbors over a communication graph. Numerous endeavors have been made to address this challenging problem by developing algorithms with compressed communication for decentralized non-convex optimization problems. Despite considerable efforts, the current results suffer from various issues such as non-scalability with the number of clients, requirements for large batches, or bounded gradient assumption. In this paper, we introduce MoTEF, a novel approach that integrates communication compression with Momentum Tracking and Error Feedback. Our analysis demonstrates that MoTEF achieves most of the desired properties, and significantly outperforms existing methods under arbitrary data heterogeneity. We provide numerical experiments to validate our theoretical findings and confirm the practical superiority of MoTEF.
Related papers
- Towards Resource-Efficient Federated Learning in Industrial IoT for Multivariate Time Series Analysis [50.18156030818883]
Anomaly and missing data constitute a thorny problem in industrial applications.
Deep learning enabled anomaly detection has emerged as a critical direction.
The data collected in edge devices contain user privacy.
arXiv Detail & Related papers (2024-11-06T15:38:31Z) - Minimum Entropy Coupling with Bottleneck [22.409716686394525]
This paper investigates a novel lossy compression framework operating under logarithmic loss.
It is especially relevant for applications that require joint compression and retrieval, and in scenarios involving distributional shifts due to processing.
We show that the proposed formulation extends the classical minimum entropy coupling framework by integrating a bottleneck.
arXiv Detail & Related papers (2024-10-29T02:19:07Z) - Boosting the Performance of Decentralized Federated Learning via Catalyst Acceleration [66.43954501171292]
We introduce Catalyst Acceleration and propose an acceleration Decentralized Federated Learning algorithm called DFedCata.
DFedCata consists of two main components: the Moreau envelope function, which addresses parameter inconsistencies, and Nesterov's extrapolation step, which accelerates the aggregation phase.
Empirically, we demonstrate the advantages of the proposed algorithm in both convergence speed and generalization performance on CIFAR10/100 with various non-iid data distributions.
arXiv Detail & Related papers (2024-10-09T06:17:16Z) - Quantization Avoids Saddle Points in Distributed Optimization [1.579622195923387]
Distributed non optimization underpins key functionalities of numerous distributed systems.
The aim of this paper is to prove that it can effectively escape saddle points convergence to a second-order stationary point convergence.
With an easily adjustable quantization, the approach allows a user control to aggressively reduce communication overhead.
arXiv Detail & Related papers (2024-03-15T15:58:20Z) - Analysis and Optimization of Wireless Federated Learning with Data
Heterogeneity [72.85248553787538]
This paper focuses on performance analysis and optimization for wireless FL, considering data heterogeneity, combined with wireless resource allocation.
We formulate the loss function minimization problem, under constraints on long-term energy consumption and latency, and jointly optimize client scheduling, resource allocation, and the number of local training epochs (CRE)
Experiments on real-world datasets demonstrate that the proposed algorithm outperforms other benchmarks in terms of the learning accuracy and energy consumption.
arXiv Detail & Related papers (2023-08-04T04:18:01Z) - Federated Compositional Deep AUC Maximization [58.25078060952361]
We develop a novel federated learning method for imbalanced data by directly optimizing the area under curve (AUC) score.
To the best of our knowledge, this is the first work to achieve such favorable theoretical results.
arXiv Detail & Related papers (2023-04-20T05:49:41Z) - FedSkip: Combatting Statistical Heterogeneity with Federated Skip
Aggregation [95.85026305874824]
We introduce a data-driven approach called FedSkip to improve the client optima by periodically skipping federated averaging and scattering local models to the cross devices.
We conduct extensive experiments on a range of datasets to demonstrate that FedSkip achieves much higher accuracy, better aggregation efficiency and competing communication efficiency.
arXiv Detail & Related papers (2022-12-14T13:57:01Z) - Low-Latency Federated Learning over Wireless Channels with Differential
Privacy [142.5983499872664]
In federated learning (FL), model training is distributed over clients and local models are aggregated by a central server.
In this paper, we aim to minimize FL training delay over wireless channels, constrained by overall training performance as well as each client's differential privacy (DP) requirement.
arXiv Detail & Related papers (2021-06-20T13:51:18Z) - Sample-based and Feature-based Federated Learning via Mini-batch SSCA [18.11773963976481]
This paper investigates sample-based and feature-based federated optimization.
We show that the proposed algorithms can preserve data privacy through the model aggregation mechanism.
We also show that the proposed algorithms converge to Karush-Kuhn-Tucker points of the respective federated optimization problems.
arXiv Detail & Related papers (2021-04-13T08:23:46Z) - Communication-efficient Variance-reduced Stochastic Gradient Descent [0.0]
We consider the problem of communication efficient distributed optimization.
In particular, we focus on the variance-reduced gradient and propose a novel approach to make it communication-efficient.
Comprehensive theoretical and numerical analyses on real datasets reveal that our algorithm can significantly reduce the communication complexity, by as much as 95%, with almost no noticeable penalty.
arXiv Detail & Related papers (2020-03-10T13:22:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.