Related papers: A Stochastic Newton Algorithm for Distributed Convex Optimization

A Stochastic Newton Algorithm for Distributed Convex Optimization

URL: http://arxiv.org/abs/2110.02954v1
Date: Thu, 7 Oct 2021 17:51:10 GMT
Title: A Stochastic Newton Algorithm for Distributed Convex Optimization
Authors: Brian Bullins, Kumar Kshitij Patel, Ohad Shamir, Nathan Srebro, Blake Woodworth
Abstract summary: We analyze a Newton algorithm for homogeneous distributed convex optimization, where each machine can calculate gradients of the same population objective. We show that our method can reduce the number, and frequency, of required communication rounds compared to existing methods without hurting performance.
Score: 62.20732134991661
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We propose and analyze a stochastic Newton algorithm for homogeneous distributed stochastic convex optimization, where each machine can calculate stochastic gradients of the same population objective, as well as stochastic Hessian-vector products (products of an independent unbiased estimator of the Hessian of the population objective with arbitrary vectors), with many such stochastic computations performed between rounds of communication. We show that our method can reduce the number, and frequency, of required communication rounds compared to existing methods without hurting performance, by proving convergence guarantees for quasi-self-concordant objectives (e.g., logistic regression), alongside empirical evidence.

Related papers

Lyapunov Stability of Stochastic Vector Optimization: Theory and Numerical Implementation [0.0]
We use a drift--diffusion model for unconstrained vector optimization in which the drift is induced by a common descent direction.<n>We implement the resulting algorithm as an open-source Python framework for multi-objective optimization.<n> Empirical results on DTLZ2 with objective counts from three to fifteen indicate a consistent trade-off.
arXiv Detail & Related papers (2026-03-04T14:04:24Z)
Neural Optimal Transport Meets Multivariate Conformal Prediction [58.43397908730771]
We propose a framework for conditional vectorile regression (CVQR)<n>CVQR combines neural optimal transport with quantized optimization, and apply it to predictions.
arXiv Detail & Related papers (2025-09-29T19:50:19Z)
Distributed Markov Chain Monte Carlo Sampling based on the Alternating Direction Method of Multipliers [143.6249073384419]
In this paper, we propose a distributed sampling scheme based on the alternating direction method of multipliers. We provide both theoretical guarantees of our algorithm's convergence and experimental evidence of its superiority to the state-of-the-art. In simulation, we deploy our algorithm on linear and logistic regression tasks and illustrate its fast convergence compared to existing gradient-based methods.
arXiv Detail & Related papers (2024-01-29T02:08:40Z)
On diffusion-based generative models and their error bounds: The log-concave case with full convergence estimates [5.13323375365494]
We provide theoretical guarantees for the convergence behaviour of diffusion-based generative models under strongly log-concave data. Our class of functions used for score estimation is made of Lipschitz continuous functions avoiding any Lipschitzness assumption on the score function. This approach yields the best known convergence rate for our sampling algorithm.
arXiv Detail & Related papers (2023-11-22T18:40:45Z)
Differentiating Metropolis-Hastings to Optimize Intractable Densities [51.16801956665228]
We develop an algorithm for automatic differentiation of Metropolis-Hastings samplers. We apply gradient-based optimization to objectives expressed as expectations over intractable target densities.
arXiv Detail & Related papers (2023-06-13T17:56:02Z)
Learning Unnormalized Statistical Models via Compositional Optimization [73.30514599338407]
Noise-contrastive estimation(NCE) has been proposed by formulating the objective as the logistic loss of the real data and the artificial noise. In this paper, we study it a direct approach for optimizing the negative log-likelihood of unnormalized models.
arXiv Detail & Related papers (2023-06-13T01:18:16Z)
Mean-Field Approximation to Gaussian-Softmax Integral with Application to Uncertainty Estimation [23.38076756988258]
We propose a new single-model based approach to quantify uncertainty in deep neural networks. We use a mean-field approximation formula to compute an analytically intractable integral. Empirically, the proposed approach performs competitively when compared to state-of-the-art methods.
arXiv Detail & Related papers (2020-06-13T07:32:38Z)
Stochastic Saddle-Point Optimization for Wasserstein Barycenters [69.68068088508505]
We consider the populationimation barycenter problem for random probability measures supported on a finite set of points and generated by an online stream of data. We employ the structure of the problem and obtain a convex-concave saddle-point reformulation of this problem. In the setting when the distribution of random probability measures is discrete, we propose an optimization algorithm and estimate its complexity.
arXiv Detail & Related papers (2020-06-11T19:40:38Z)
Nonparametric Score Estimators [49.42469547970041]
Estimating the score from a set of samples generated by an unknown distribution is a fundamental task in inference and learning of probabilistic models. We provide a unifying view of these estimators under the framework of regularized nonparametric regression. We propose score estimators based on iterative regularization that enjoy computational benefits from curl-free kernels and fast convergence.
arXiv Detail & Related papers (2020-05-20T15:01:03Z)
Distributed Stochastic Nonconvex Optimization and Learning based on Successive Convex Approximation [26.11677569331688]
We introduce a novel framework for the distributed algorithmic minimization of the sum of the sum of the agents in a network. We show that the proposed method can be applied to distributed neural networks.
arXiv Detail & Related papers (2020-04-30T15:36:46Z)
A Hybrid-Order Distributed SGD Method for Non-Convex Optimization to Balance Communication Overhead, Computational Complexity, and Convergence Rate [28.167294398293297]
We propose a method of distributed gradient descent (SGD) with low communication load and computational complexity, and still fast. To reduce the computational complexity in each iteration, the worker nodes approximate the directional derivatives with zeroth-order gradient estimation.
arXiv Detail & Related papers (2020-03-27T14:02:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.