Distribution Compression in Near-linear Time
- URL: http://arxiv.org/abs/2111.07941v2
- Date: Wed, 17 Nov 2021 01:49:21 GMT
- Title: Distribution Compression in Near-linear Time
- Authors: Abhishek Shetty, Raaz Dwivedi, Lester Mackey
- Abstract summary: We introduce Compress++, a simple meta-procedure for speeding up any thinning algorithm.
It delivers $sqrtn$ points with $mathcalO(sqrtlog n/n)$ integration error and better-than-Monte-Carlo maximum mean discrepancy.
- Score: 27.18971095426405
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In distribution compression, one aims to accurately summarize a probability
distribution $\mathbb{P}$ using a small number of representative points.
Near-optimal thinning procedures achieve this goal by sampling $n$ points from
a Markov chain and identifying $\sqrt{n}$ points with
$\widetilde{\mathcal{O}}(1/\sqrt{n})$ discrepancy to $\mathbb{P}$.
Unfortunately, these algorithms suffer from quadratic or super-quadratic
runtime in the sample size $n$. To address this deficiency, we introduce
Compress++, a simple meta-procedure for speeding up any thinning algorithm
while suffering at most a factor of $4$ in error. When combined with the
quadratic-time kernel halving and kernel thinning algorithms of Dwivedi and
Mackey (2021), Compress++ delivers $\sqrt{n}$ points with
$\mathcal{O}(\sqrt{\log n/n})$ integration error and better-than-Monte-Carlo
maximum mean discrepancy in $\mathcal{O}(n \log^3 n)$ time and $\mathcal{O}(
\sqrt{n} \log^2 n )$ space. Moreover, Compress++ enjoys the same near-linear
runtime given any quadratic-time input and reduces the runtime of
super-quadratic algorithms by a square-root factor. In our benchmarks with
high-dimensional Monte Carlo samples and Markov chains targeting challenging
differential equation posteriors, Compress++ matches or nearly matches the
accuracy of its input algorithm in orders of magnitude less time.
Related papers
- Implicit High-Order Moment Tensor Estimation and Learning Latent Variable Models [39.33814194788341]
We study the task of learning latent-variable models.
Motivated by such applications, we develop a general efficient algorithm for implicit moment computation.
By leveraging our general algorithm, we obtain the first-time learners for the following models.
arXiv Detail & Related papers (2024-11-23T23:13:24Z) - A Sub-Quadratic Time Algorithm for Robust Sparse Mean Estimation [6.853165736531941]
We study the algorithmic problem of sparse mean estimation in the presence of adversarial outliers.
Our main contribution is an algorithm for robust sparse mean estimation which runs in emphsubquadratic time using $mathrmpoly(k,log d,1/epsilon)$ samples.
arXiv Detail & Related papers (2024-03-07T18:23:51Z) - Provably learning a multi-head attention layer [55.2904547651831]
Multi-head attention layer is one of the key components of the transformer architecture that sets it apart from traditional feed-forward models.
In this work, we initiate the study of provably learning a multi-head attention layer from random examples.
We prove computational lower bounds showing that in the worst case, exponential dependence on $m$ is unavoidable.
arXiv Detail & Related papers (2024-02-06T15:39:09Z) - Fast Minimization of Expected Logarithmic Loss via Stochastic Dual
Averaging [8.990961435218544]
We propose a first-order algorithm named $B$-sample dual averaging with the logarithmic barrier.
For the Poisson inverse problem, our algorithm attains an $varepsilon$ solution in $smashtildeO(d3/varepsilon2)$ time.
When computing the maximum-likelihood estimate for quantum state tomography, our algorithm yields an $varepsilon$-optimal solution in $smashtildeO(d3/varepsilon2)$ time.
arXiv Detail & Related papers (2023-11-05T03:33:44Z) - Efficiently Learning One-Hidden-Layer ReLU Networks via Schur
Polynomials [50.90125395570797]
We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $mathbbRd$ with respect to the square loss.
Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/epsilon)O(k)$, whereepsilon>0$ is the target accuracy.
arXiv Detail & Related papers (2023-07-24T14:37:22Z) - An Oblivious Stochastic Composite Optimization Algorithm for Eigenvalue
Optimization Problems [76.2042837251496]
We introduce two oblivious mirror descent algorithms based on a complementary composite setting.
Remarkably, both algorithms work without prior knowledge of the Lipschitz constant or smoothness of the objective function.
We show how to extend our framework to scale and demonstrate the efficiency and robustness of our methods on large scale semidefinite programs.
arXiv Detail & Related papers (2023-06-30T08:34:29Z) - Sparse PCA Beyond Covariance Thresholding [2.311583680973075]
We show that for every $t ll k$ there exists an algorithm running in time $ncdot dO(t)$ that solves this problem as long as [ gtrsim fracksqrtntsqrtln(2 + td/k2)$.
We provide time algorithms for the sparse planted vector problem that have better guarantees than the state of the art in some regimes.
arXiv Detail & Related papers (2023-02-20T18:45:24Z) - Learning a Single Neuron with Adversarial Label Noise via Gradient
Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations.
The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z) - Fast Graph Sampling for Short Video Summarization using Gershgorin Disc
Alignment [52.577757919003844]
We study the problem of efficiently summarizing a short video into several paragraphs, leveraging recent progress in fast graph sampling.
Experimental results show that our algorithm achieves comparable video summarization as state-of-the-art methods, at a substantially reduced complexity.
arXiv Detail & Related papers (2021-10-21T18:43:00Z) - The Complexity of Sparse Tensor PCA [1.90365714903665]
For any $1 leq leq k$, our algorithms recover the sparse vector for signal-to-noise ratio $lambda geq tildemathcalO (sqrtt cdot (k/t)p/2)$ in time.
Even in the restricted case of PCA, known algorithms only recover the sparse vectors for $lambda geq tildemathcalO(k cdot r) while our algorithms require $lambda ge
arXiv Detail & Related papers (2021-06-11T10:57:00Z) - List-Decodable Mean Estimation in Nearly-PCA Time [50.79691056481693]
We study the fundamental task of list-decodable mean estimation in high dimensions.
Our algorithm runs in time $widetildeO(ndk)$ for all $k = O(sqrtd) cup Omega(d)$, where $n$ is the size of the dataset.
A variant of our algorithm has runtime $widetildeO(ndk)$ for all $k$, at the expense of an $O(sqrtlog k)$ factor in the recovery guarantee
arXiv Detail & Related papers (2020-11-19T17:21:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.