Data Structures for Density Estimation
- URL: http://arxiv.org/abs/2306.11312v1
- Date: Tue, 20 Jun 2023 06:13:56 GMT
- Title: Data Structures for Density Estimation
- Authors: Anders Aamand, Alexandr Andoni, Justin Y. Chen, Piotr Indyk, Shyam
Narayanan, Sandeep Silwal
- Abstract summary: Given a sublinear (in $n$) number of samples from $p$, our main result is the first data structure that identifies $v_i$ in time sublinear in $k$.
We also give an improved version of the algorithm of Acharya et al. that reports $v_i$ in time linear in $k$.
- Score: 66.36971978162461
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study statistical/computational tradeoffs for the following density
estimation problem: given $k$ distributions $v_1, \ldots, v_k$ over a discrete
domain of size $n$, and sampling access to a distribution $p$, identify $v_i$
that is "close" to $p$. Our main result is the first data structure that, given
a sublinear (in $n$) number of samples from $p$, identifies $v_i$ in time
sublinear in $k$. We also give an improved version of the algorithm of Acharya
et al. (2018) that reports $v_i$ in time linear in $k$. The experimental
evaluation of the latter algorithm shows that it achieves a significant
reduction in the number of operations needed to achieve a given accuracy
compared to prior work.
Related papers
- Statistical-Computational Trade-offs for Density Estimation [60.81548752871115]
We show that for a broad class of data structures their bounds cannot be significantly improved.
This is a novel emphstatistical-computational trade-off for density estimation.
arXiv Detail & Related papers (2024-10-30T15:03:33Z) - Online Robust Mean Estimation [37.746091744197656]
We study the problem of high-dimensional robust mean estimation in an online setting.
We prove two main results about online robust mean estimation in this model.
arXiv Detail & Related papers (2023-10-24T15:28:43Z) - PLAN: Variance-Aware Private Mean Estimation [12.452663855242344]
Differentially private mean estimation is an important building block in privacy-preserving algorithms for data analysis and machine learning.
We show how to exploit skew in the vector $boldsymbolsigma$, obtaining a (zero-digma) differentially private mean estimate with $ell$ error.
To verify the effectiveness of PLAN, we empirically evaluate accuracy on both synthetic and real world data.
arXiv Detail & Related papers (2023-06-14T21:04:50Z) - Improved Learning-augmented Algorithms for k-means and k-medians
Clustering [8.04779839951237]
We consider the problem of clustering in the learning-augmented setting, where we are given a data set in $d$-dimensional Euclidean space.
We propose a deterministic $k$-means algorithm that produces centers with improved bound on clustering cost.
Our algorithm works even when the predictions are not very accurate, i.e. our bound holds for $alpha$ up to $1/2$, an improvement over $alpha$ being at most $1/7$ in the previous work.
arXiv Detail & Related papers (2022-10-31T03:00:11Z) - TURF: A Two-factor, Universal, Robust, Fast Distribution Learning
Algorithm [64.13217062232874]
One of its most powerful and successful modalities approximates every distribution to an $ell$ distance essentially at most a constant times larger than its closest $t$-piece degree-$d_$.
We provide a method that estimates this number near-optimally, hence helps approach the best possible approximation.
arXiv Detail & Related papers (2022-02-15T03:49:28Z) - Clustering Mixture Models in Almost-Linear Time via List-Decodable Mean
Estimation [58.24280149662003]
We study the problem of list-decodable mean estimation, where an adversary can corrupt a majority of the dataset.
We develop new algorithms for list-decodable mean estimation, achieving nearly-optimal statistical guarantees.
arXiv Detail & Related papers (2021-06-16T03:34:14Z) - List-Decodable Mean Estimation in Nearly-PCA Time [50.79691056481693]
We study the fundamental task of list-decodable mean estimation in high dimensions.
Our algorithm runs in time $widetildeO(ndk)$ for all $k = O(sqrtd) cup Omega(d)$, where $n$ is the size of the dataset.
A variant of our algorithm has runtime $widetildeO(ndk)$ for all $k$, at the expense of an $O(sqrtlog k)$ factor in the recovery guarantee
arXiv Detail & Related papers (2020-11-19T17:21:37Z) - Optimal Robust Linear Regression in Nearly Linear Time [97.11565882347772]
We study the problem of high-dimensional robust linear regression where a learner is given access to $n$ samples from the generative model $Y = langle X,w* rangle + epsilon$
We propose estimators for this problem under two settings: (i) $X$ is L4-L2 hypercontractive, $mathbbE [XXtop]$ has bounded condition number and $epsilon$ has bounded variance and (ii) $X$ is sub-Gaussian with identity second moment and $epsilon$ is
arXiv Detail & Related papers (2020-07-16T06:44:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.