New Bounds For Distributed Mean Estimation and Variance Reduction
- URL: http://arxiv.org/abs/2002.09268v4
- Date: Wed, 7 Apr 2021 15:50:18 GMT
- Title: New Bounds For Distributed Mean Estimation and Variance Reduction
- Authors: Peter Davies, Vijaykrishna Gurunathan, Niusha Moshrefi, Saleh
Ashkboos, Dan Alistarh
- Abstract summary: We consider the problem of distributed mean estimation (DME) in which $n$ machines are each given a local $d$-dimensional vector $x_v in mathbbRd$.
We show that our method yields practical improvements for common applications, relative to prior approaches.
- Score: 25.815612182815702
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the problem of distributed mean estimation (DME), in which $n$
machines are each given a local $d$-dimensional vector $x_v \in \mathbb{R}^d$,
and must cooperate to estimate the mean of their inputs $\mu = \frac 1n\sum_{v
= 1}^n x_v$, while minimizing total communication cost.
DME is a fundamental construct in distributed machine learning, and there has
been considerable work on variants of this problem, especially in the context
of distributed variance reduction for stochastic gradients in parallel SGD.
Previous work typically assumes an upper bound on the norm of the input
vectors, and achieves an error bound in terms of this norm. However, in many
real applications, the input vectors are concentrated around the correct output
$\mu$, but $\mu$ itself has large norm. In such cases, previous output error
bounds perform poorly.
In this paper, we show that output error bounds need not depend on input
norm. We provide a method of quantization which allows distributed mean
estimation to be performed with solution quality dependent only on the distance
between inputs, not on input norm, and show an analogous result for distributed
variance reduction. The technique is based on a new connection with lattice
theory. We also provide lower bounds showing that the communication to error
trade-off of our algorithms is asymptotically optimal.
As the lattices achieving optimal bounds under $\ell_2$-norm can be
computationally impractical, we also present an extension which leverages
easy-to-use cubic lattices, and is loose only up to a logarithmic factor in
$d$. We show experimentally that our method yields practical improvements for
common applications, relative to prior approaches.
Related papers
- Convergence of Unadjusted Langevin in High Dimensions: Delocalization of Bias [13.642712817536072]
We show that as the dimension $d$ of the problem increases, the number of iterations required to ensure convergence within a desired error increases.
A key technical challenge we address is the lack of a one-step contraction property in the $W_2,ellinfty$ metric to measure convergence.
arXiv Detail & Related papers (2024-08-20T01:24:54Z) - Optimality in Mean Estimation: Beyond Worst-Case, Beyond Sub-Gaussian,
and Beyond $1+\alpha$ Moments [10.889739958035536]
We introduce a new definitional framework to analyze the fine-grained optimality of algorithms.
We show that median-of-means is neighborhood optimal, up to constant factors.
It is open to find a neighborhood-separated estimator without constant factor slackness.
arXiv Detail & Related papers (2023-11-21T18:50:38Z) - Estimation of entropy-regularized optimal transport maps between
non-compactly supported measures [15.857723276537248]
This paper addresses the problem of estimating entropy-regularized optimal transport maps with squared-Euclidean cost between source and target measures that are subGaussian.
arXiv Detail & Related papers (2023-11-20T17:18:21Z) - Distributed Extra-gradient with Optimal Complexity and Communication
Guarantees [60.571030754252824]
We consider monotone variational inequality (VI) problems in multi-GPU settings where multiple processors/workers/clients have access to local dual vectors.
Extra-gradient, which is a de facto algorithm for monotone VI problems, has not been designed to be communication-efficient.
We propose a quantized generalized extra-gradient (Q-GenX), which is an unbiased and adaptive compression method tailored to solve VIs.
arXiv Detail & Related papers (2023-08-17T21:15:04Z) - Pseudonorm Approachability and Applications to Regret Minimization [73.54127663296906]
We convert high-dimensional $ell_infty$-approachability problems to low-dimensional pseudonorm approachability problems.
We develop an algorithmic theory of pseudonorm approachability, analogous to previous work on approachability for $ell$ and other norms.
arXiv Detail & Related papers (2023-02-03T03:19:14Z) - General Gaussian Noise Mechanisms and Their Optimality for Unbiased Mean
Estimation [58.03500081540042]
A classical approach to private mean estimation is to compute the true mean and add unbiased, but possibly correlated, Gaussian noise to it.
We show that for every input dataset, an unbiased mean estimator satisfying concentrated differential privacy introduces approximately at least as much error.
arXiv Detail & Related papers (2023-01-31T18:47:42Z) - Minimax Optimal Quantization of Linear Models: Information-Theoretic
Limits and Efficient Algorithms [59.724977092582535]
We consider the problem of quantizing a linear model learned from measurements.
We derive an information-theoretic lower bound for the minimax risk under this setting.
We show that our method and upper-bounds can be extended for two-layer ReLU neural networks.
arXiv Detail & Related papers (2022-02-23T02:39:04Z) - Stochastic regularized majorization-minimization with weakly convex and
multi-convex surrogates [0.0]
We show that the first optimality gap of the proposed algorithm decays at the rate of the expected loss for various methods under nontens dependent data setting.
We obtain first convergence point for various methods under nontens dependent data setting.
arXiv Detail & Related papers (2022-01-05T15:17:35Z) - Optimal policy evaluation using kernel-based temporal difference methods [78.83926562536791]
We use kernel Hilbert spaces for estimating the value function of an infinite-horizon discounted Markov reward process.
We derive a non-asymptotic upper bound on the error with explicit dependence on the eigenvalues of the associated kernel operator.
We prove minimax lower bounds over sub-classes of MRPs.
arXiv Detail & Related papers (2021-09-24T14:48:20Z) - Towards Tight Communication Lower Bounds for Distributed Optimisation [30.134447658488057]
We consider a standard distributed optimisation setting where $N$ machines aim to jointly minimise the sum of the functions $sum_i = 1N f_i (x)$.
Our main result provides the first fully unconditional bounds on total number of bits which need to be sent and received by the $N$ machines.
We show that $Omega( Nd log d / Nvarepsilon)$ total bits need to be communicated between the machines to find an additive $epsilon$-approximation to the minimum of $sum_i = 1N
arXiv Detail & Related papers (2020-10-16T08:10:02Z) - Log-Likelihood Ratio Minimizing Flows: Towards Robust and Quantifiable
Neural Distribution Alignment [52.02794488304448]
We propose a new distribution alignment method based on a log-likelihood ratio statistic and normalizing flows.
We experimentally verify that minimizing the resulting objective results in domain alignment that preserves the local structure of input domains.
arXiv Detail & Related papers (2020-03-26T22:10:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.