Related papers: New Bounds For Distributed Mean Estimation and Variance Reduction

New Bounds For Distributed Mean Estimation and Variance Reduction

URL: http://arxiv.org/abs/2002.09268v4
Date: Wed, 7 Apr 2021 15:50:18 GMT
Title: New Bounds For Distributed Mean Estimation and Variance Reduction
Authors: Peter Davies, Vijaykrishna Gurunathan, Niusha Moshrefi, Saleh Ashkboos, Dan Alistarh
Abstract summary: We consider the problem of distributed mean estimation (DME) in which $n$ machines are each given a local $d$-dimensional vector $x_v in mathbbRd$. We show that our method yields practical improvements for common applications, relative to prior approaches.
Score: 25.815612182815702
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider the problem of distributed mean estimation (DME), in which $n$ machines are each given a local $d$-dimensional vector $x_v \in \mathbb{R}^d$, and must cooperate to estimate the mean of their inputs $\mu = \frac 1n\sum_{v = 1}^n x_v$, while minimizing total communication cost. DME is a fundamental construct in distributed machine learning, and there has been considerable work on variants of this problem, especially in the context of distributed variance reduction for stochastic gradients in parallel SGD. Previous work typically assumes an upper bound on the norm of the input vectors, and achieves an error bound in terms of this norm. However, in many real applications, the input vectors are concentrated around the correct output $\mu$, but $\mu$ itself has large norm. In such cases, previous output error bounds perform poorly. In this paper, we show that output error bounds need not depend on input norm. We provide a method of quantization which allows distributed mean estimation to be performed with solution quality dependent only on the distance between inputs, not on input norm, and show an analogous result for distributed variance reduction. The technique is based on a new connection with lattice theory. We also provide lower bounds showing that the communication to error trade-off of our algorithms is asymptotically optimal. As the lattices achieving optimal bounds under $\ell_2$-norm can be computationally impractical, we also present an extension which leverages easy-to-use cubic lattices, and is loose only up to a logarithmic factor in $d$. We show experimentally that our method yields practical improvements for common applications, relative to prior approaches.

Related papers

Convergence of Unadjusted Langevin in High Dimensions: Delocalization of Bias [13.642712817536072]
We show that as the dimension $d$ of the problem increases, the number of iterations required to ensure convergence within a desired error increases. A key technical challenge we address is the lack of a one-step contraction property in the $W_2,ellinfty$ metric to measure convergence.
arXiv Detail & Related papers (2024-08-20T01:24:54Z)
Semidefinite programming relaxations and debiasing for MAXCUT-based clustering [1.9761774213809036]
We consider the problem of partitioning a small data sample of size $n$ drawn from a mixture of 2 sub-gaussian distributions in $mathbbRp$. We use semidefinite programming relaxations of an integer quadratic program that is formulated as finding the maximum cut on a graph.
arXiv Detail & Related papers (2024-01-16T03:14:24Z)
Optimality in Mean Estimation: Beyond Worst-Case, Beyond Sub-Gaussian, and Beyond $1+\alpha$ Moments [10.889739958035536]
We introduce a new definitional framework to analyze the fine-grained optimality of algorithms. We show that median-of-means is neighborhood optimal, up to constant factors. It is open to find a neighborhood-separated estimator without constant factor slackness.
arXiv Detail & Related papers (2023-11-21T18:50:38Z)
Estimation of entropy-regularized optimal transport maps between non-compactly supported measures [15.857723276537248]
This paper addresses the problem of estimating entropy-regularized optimal transport maps with squared-Euclidean cost between source and target measures that are subGaussian.
arXiv Detail & Related papers (2023-11-20T17:18:21Z)
Distributed Extra-gradient with Optimal Complexity and Communication Guarantees [60.571030754252824]
We consider monotone variational inequality (VI) problems in multi-GPU settings where multiple processors/workers/clients have access to local dual vectors. Extra-gradient, which is a de facto algorithm for monotone VI problems, has not been designed to be communication-efficient. We propose a quantized generalized extra-gradient (Q-GenX), which is an unbiased and adaptive compression method tailored to solve VIs.
arXiv Detail & Related papers (2023-08-17T21:15:04Z)
Pseudonorm Approachability and Applications to Regret Minimization [73.54127663296906]
We convert high-dimensional $ell_infty$-approachability problems to low-dimensional pseudonorm approachability problems. We develop an algorithmic theory of pseudonorm approachability, analogous to previous work on approachability for $ell$ and other norms.
arXiv Detail & Related papers (2023-02-03T03:19:14Z)
General Gaussian Noise Mechanisms and Their Optimality for Unbiased Mean Estimation [58.03500081540042]
A classical approach to private mean estimation is to compute the true mean and add unbiased, but possibly correlated, Gaussian noise to it. We show that for every input dataset, an unbiased mean estimator satisfying concentrated differential privacy introduces approximately at least as much error.
arXiv Detail & Related papers (2023-01-31T18:47:42Z)
Minimax Optimal Quantization of Linear Models: Information-Theoretic Limits and Efficient Algorithms [59.724977092582535]
We consider the problem of quantizing a linear model learned from measurements. We derive an information-theoretic lower bound for the minimax risk under this setting. We show that our method and upper-bounds can be extended for two-layer ReLU neural networks.
arXiv Detail & Related papers (2022-02-23T02:39:04Z)
Stochastic regularized majorization-minimization with weakly convex and multi-convex surrogates [0.0]
We show that the first optimality gap of the proposed algorithm decays at the rate of the expected loss for various methods under nontens dependent data setting. We obtain first convergence point for various methods under nontens dependent data setting.
arXiv Detail & Related papers (2022-01-05T15:17:35Z)
Optimal policy evaluation using kernel-based temporal difference methods [78.83926562536791]
We use kernel Hilbert spaces for estimating the value function of an infinite-horizon discounted Markov reward process. We derive a non-asymptotic upper bound on the error with explicit dependence on the eigenvalues of the associated kernel operator. We prove minimax lower bounds over sub-classes of MRPs.
arXiv Detail & Related papers (2021-09-24T14:48:20Z)
Towards Tight Communication Lower Bounds for Distributed Optimisation [30.134447658488057]
We consider a standard distributed optimisation setting where $N$ machines aim to jointly minimise the sum of the functions $sum_i = 1N f_i (x)$. Our main result provides the first fully unconditional bounds on total number of bits which need to be sent and received by the $N$ machines. We show that $Omega( Nd log d / Nvarepsilon)$ total bits need to be communicated between the machines to find an additive $epsilon$-approximation to the minimum of $sum_i = 1N
arXiv Detail & Related papers (2020-10-16T08:10:02Z)
Log-Likelihood Ratio Minimizing Flows: Towards Robust and Quantifiable Neural Distribution Alignment [52.02794488304448]
We propose a new distribution alignment method based on a log-likelihood ratio statistic and normalizing flows. We experimentally verify that minimizing the resulting objective results in domain alignment that preserves the local structure of input domains.
arXiv Detail & Related papers (2020-03-26T22:10:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.