Quantization Avoids Saddle Points in Distributed Optimization
- URL: http://arxiv.org/abs/2403.10423v1
- Date: Fri, 15 Mar 2024 15:58:20 GMT
- Title: Quantization Avoids Saddle Points in Distributed Optimization
- Authors: Yanan Bo, Yongqiang Wang,
- Abstract summary: Distributed non optimization underpins key functionalities of numerous distributed systems.
The aim of this paper is to prove that it can effectively escape saddle points convergence to a second-order stationary point convergence.
With an easily adjustable quantization, the approach allows a user control to aggressively reduce communication overhead.
- Score: 1.579622195923387
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Distributed nonconvex optimization underpins key functionalities of numerous distributed systems, ranging from power systems, smart buildings, cooperative robots, vehicle networks to sensor networks. Recently, it has also merged as a promising solution to handle the enormous growth in data and model sizes in deep learning. A fundamental problem in distributed nonconvex optimization is avoiding convergence to saddle points, which significantly degrade optimization accuracy. We discover that the process of quantization, which is necessary for all digital communications, can be exploited to enable saddle-point avoidance. More specifically, we propose a stochastic quantization scheme and prove that it can effectively escape saddle points and ensure convergence to a second-order stationary point in distributed nonconvex optimization. With an easily adjustable quantization granularity, the approach allows a user to control the number of bits sent per iteration and, hence, to aggressively reduce the communication overhead. Numerical experimental results using distributed optimization and learning problems on benchmark datasets confirm the effectiveness of the approach.
Related papers
- Fast networked data selection via distributed smoothed quantile estimation [6.002041236376175]
We establish a connection between selecting the most informative data and finding the top-$k$ elements of a multiset.
The top-$k$ selection in a network can be formulated as a distributed nonsmooth convex optimization problem known as quantile estimation.
We characterize the complexity required to achieve top-$k$ selection, a challenging task due to the lack of strong convexity.
arXiv Detail & Related papers (2024-06-04T03:26:15Z) - Random Aggregate Beamforming for Over-the-Air Federated Learning in Large-Scale Networks [66.18765335695414]
We consider a joint device selection and aggregate beamforming design with the objectives of minimizing the aggregate error and maximizing the number of selected devices.
To tackle the problems in a cost-effective manner, we propose a random aggregate beamforming-based scheme.
We additionally use analysis to study the obtained aggregate error and the number of the selected devices when the number of devices becomes large.
arXiv Detail & Related papers (2024-02-20T23:59:45Z) - Distributed Markov Chain Monte Carlo Sampling based on the Alternating
Direction Method of Multipliers [143.6249073384419]
In this paper, we propose a distributed sampling scheme based on the alternating direction method of multipliers.
We provide both theoretical guarantees of our algorithm's convergence and experimental evidence of its superiority to the state-of-the-art.
In simulation, we deploy our algorithm on linear and logistic regression tasks and illustrate its fast convergence compared to existing gradient-based methods.
arXiv Detail & Related papers (2024-01-29T02:08:40Z) - Distributed Variational Inference for Online Supervised Learning [15.038649101409804]
This paper develops a scalable distributed probabilistic inference algorithm.
It applies to continuous variables, intractable posteriors and large-scale real-time data in sensor networks.
arXiv Detail & Related papers (2023-09-05T22:33:02Z) - Binarizing Sparse Convolutional Networks for Efficient Point Cloud
Analysis [93.55896765176414]
We propose binary sparse convolutional networks called BSC-Net for efficient point cloud analysis.
We employ the differentiable search strategies to discover the optimal opsitions for active site matching in the shifted sparse convolution.
Our BSC-Net achieves significant improvement upon our srtong baseline and outperforms the state-of-the-art network binarization methods.
arXiv Detail & Related papers (2023-03-27T13:47:06Z) - Quantization enabled Privacy Protection in Decentralized Stochastic
Optimization [34.24521534464185]
Decentralized optimization can be used in areas as diverse as machine learning, control, and sensor networks.
Privacy protection has emerged as a crucial need in the implementation of decentralized optimization.
We propose an algorithm that is able to guarantee provable convergence accuracy even in the presence of aggressive quantization errors.
arXiv Detail & Related papers (2022-08-07T15:17:23Z) - Tree ensemble kernels for Bayesian optimization with known constraints
over mixed-feature spaces [54.58348769621782]
Tree ensembles can be well-suited for black-box optimization tasks such as algorithm tuning and neural architecture search.
Two well-known challenges in using tree ensembles for black-box optimization are (i) effectively quantifying model uncertainty for exploration and (ii) optimizing over the piece-wise constant acquisition function.
Our framework performs as well as state-of-the-art methods for unconstrained black-box optimization over continuous/discrete features and outperforms competing methods for problems combining mixed-variable feature spaces and known input constraints.
arXiv Detail & Related papers (2022-07-02T16:59:37Z) - Distributed Sketching for Randomized Optimization: Exact
Characterization, Concentration and Lower Bounds [54.51566432934556]
We consider distributed optimization methods for problems where forming the Hessian is computationally challenging.
We leverage randomized sketches for reducing the problem dimensions as well as preserving privacy and improving straggler resilience in asynchronous distributed systems.
arXiv Detail & Related papers (2022-03-18T05:49:13Z) - Fast Distributionally Robust Learning with Variance Reduced Min-Max
Optimization [85.84019017587477]
Distributionally robust supervised learning is emerging as a key paradigm for building reliable machine learning systems for real-world applications.
Existing algorithms for solving Wasserstein DRSL involve solving complex subproblems or fail to make use of gradients.
We revisit Wasserstein DRSL through the lens of min-max optimization and derive scalable and efficiently implementable extra-gradient algorithms.
arXiv Detail & Related papers (2021-04-27T16:56:09Z) - Sample-based and Feature-based Federated Learning via Mini-batch SSCA [18.11773963976481]
This paper investigates sample-based and feature-based federated optimization.
We show that the proposed algorithms can preserve data privacy through the model aggregation mechanism.
We also show that the proposed algorithms converge to Karush-Kuhn-Tucker points of the respective federated optimization problems.
arXiv Detail & Related papers (2021-04-13T08:23:46Z) - Communication-efficient Variance-reduced Stochastic Gradient Descent [0.0]
We consider the problem of communication efficient distributed optimization.
In particular, we focus on the variance-reduced gradient and propose a novel approach to make it communication-efficient.
Comprehensive theoretical and numerical analyses on real datasets reveal that our algorithm can significantly reduce the communication complexity, by as much as 95%, with almost no noticeable penalty.
arXiv Detail & Related papers (2020-03-10T13:22:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.