The Sparse Hausdorff Moment Problem, with Application to Topic Models
- URL: http://arxiv.org/abs/2007.08101v3
- Date: Mon, 7 Sep 2020 17:24:41 GMT
- Title: The Sparse Hausdorff Moment Problem, with Application to Topic Models
- Authors: Spencer Gordon, Bijan Mazaheri, Leonard J. Schulman, Yuval Rabani
- Abstract summary: We give an algorithm for identifying a $k$-mixture using samples of $m=2k$ iid binary random variables.
It suffices to know the moments to additive accuracy $w_mincdotzetaO(k)$.
- Score: 5.151973524974052
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We consider the problem of identifying, from its first $m$ noisy moments, a
probability distribution on $[0,1]$ of support $k<\infty$. This is equivalent
to the problem of learning a distribution on $m$ observable binary random
variables $X_1,X_2,\dots,X_m$ that are iid conditional on a hidden random
variable $U$ taking values in $\{1,2,\dots,k\}$. Our focus is on accomplishing
this with $m=2k$, which is the minimum $m$ for which verifying that the source
is a $k$-mixture is possible (even with exact statistics). This problem, so
simply stated, is quite useful: e.g., by a known reduction, any algorithm for
it lifts to an algorithm for learning pure topic models.
We give an algorithm for identifying a $k$-mixture using samples of $m=2k$
iid binary random variables using a sample of size $\left(1/w_{\min}\right)^2
\cdot\left(1/\zeta\right)^{O(k)}$ and post-sampling runtime of only
$O(k^{2+o(1)})$ arithmetic operations. Here $w_{\min}$ is the minimum
probability of an outcome of $U$, and $\zeta$ is the minimum separation between
the distinct success probabilities of the $X_i$s. Stated in terms of the moment
problem, it suffices to know the moments to additive accuracy
$w_{\min}\cdot\zeta^{O(k)}$. It is known that the sample complexity of any
solution to the identification problem must be at least exponential in $k$.
Previous results demonstrated either worse sample complexity and worse $O(k^c)$
runtime for some $c$ substantially larger than $2$, or similar sample
complexity and much worse $k^{O(k^2)}$ runtime.
Related papers
- Simple and Nearly-Optimal Sampling for Rank-1 Tensor Completion via Gauss-Jordan [49.1574468325115]
We revisit the sample and computational complexity of completing a rank-1 tensor in $otimes_i=1N mathbbRd$.
We present a characterization of the problem which admits an algorithm amounting to Gauss-Jordan on a pair of random linear systems.
arXiv Detail & Related papers (2024-08-10T04:26:19Z) - Outlier Robust Multivariate Polynomial Regression [27.03423421704806]
We are given a set of random samples $(mathbfx_i,y_i) in [-1,1]n times mathbbR$ that are noisy versions of $(mathbfx_i,p(mathbfx_i)$.
The goal is to output a $hatp$, within an $ell_in$-distance of at most $O(sigma)$ from $p$.
arXiv Detail & Related papers (2024-03-14T15:04:45Z) - Stochastic Approximation Approaches to Group Distributionally Robust
Optimization [96.26317627118912]
Group distributionally robust optimization (GDRO)
Online learning techniques to reduce the number of samples required in each round from $m$ to $1$, keeping the same sample.
A novel formulation of weighted GDRO, which allows us to derive distribution-dependent convergence rates.
arXiv Detail & Related papers (2023-02-18T09:24:15Z) - TURF: A Two-factor, Universal, Robust, Fast Distribution Learning
Algorithm [64.13217062232874]
One of its most powerful and successful modalities approximates every distribution to an $ell$ distance essentially at most a constant times larger than its closest $t$-piece degree-$d_$.
We provide a method that estimates this number near-optimally, hence helps approach the best possible approximation.
arXiv Detail & Related papers (2022-02-15T03:49:28Z) - Improved quantum data analysis [1.8416014644193066]
We give a quantum "Threshold Search" algorithm that requires only $O(log2 m)/epsilon2)$ samples of a $d$-dimensional state.
We also give an alternative Hypothesis Selection method using $tildeO((log3 m)/epsilon2)$ samples.
arXiv Detail & Related papers (2020-11-22T01:22:37Z) - Streaming Complexity of SVMs [110.63976030971106]
We study the space complexity of solving the bias-regularized SVM problem in the streaming model.
We show that for both problems, for dimensions of $frac1lambdaepsilon$, one can obtain streaming algorithms with spacely smaller than $frac1lambdaepsilon$.
arXiv Detail & Related papers (2020-07-07T17:10:00Z) - Model-Free Reinforcement Learning: from Clipped Pseudo-Regret to Sample
Complexity [59.34067736545355]
Given an MDP with $S$ states, $A$ actions, the discount factor $gamma in (0,1)$, and an approximation threshold $epsilon > 0$, we provide a model-free algorithm to learn an $epsilon$-optimal policy.
For small enough $epsilon$, we show an improved algorithm with sample complexity.
arXiv Detail & Related papers (2020-06-06T13:34:41Z) - Learning Mixtures of Spherical Gaussians via Fourier Analysis [0.5381004207943596]
We find that a bound on the sample and computational complexity was previously unknown when $omega(1) leq d leq O(log k)$.
These authors also show that the sample of complexity of a random mixture of gaussians in a ball of radius $d$ in $d$ dimensions, when $d$ is $Theta(sqrtd)$ in $d$ dimensions, when $d$ is at least $poly(k, frac1delta)$.
arXiv Detail & Related papers (2020-04-13T08:06:29Z) - Locally Private Hypothesis Selection [96.06118559817057]
We output a distribution from $mathcalQ$ whose total variation distance to $p$ is comparable to the best such distribution.
We show that the constraint of local differential privacy incurs an exponential increase in cost.
Our algorithms result in exponential improvements on the round complexity of previous methods.
arXiv Detail & Related papers (2020-02-21T18:30:48Z) - Sample Amplification: Increasing Dataset Size even when Learning is Impossible [15.864702679819544]
Given data drawn from an unknown distribution, $D$, to what extent is it possible to amplify'' this dataset and output an even larger set of samples that appear to have been drawn from $D$?
We formalize this question as follows: an $left(n, n + Theta(fracnsqrtk)right)$ amplifier exists, even though learning the distribution to small constant total variation distance requires $Theta(d)$ samples.
arXiv Detail & Related papers (2019-04-26T21:42:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.