CountSketches, Feature Hashing and the Median of Three
- URL: http://arxiv.org/abs/2102.02193v1
- Date: Wed, 3 Feb 2021 18:45:21 GMT
- Title: CountSketches, Feature Hashing and the Median of Three
- Authors: Kasper Green Larsen, Rasmus Pagh, Jakub T\v{e}tek
- Abstract summary: We revisit the classic CountSketch method, which is a sparse, random projection that transforms a (high-dimensional) Euclidean vector $v$ to a vector of dimension $(2t-1) s$.
Our main contribution is a new analysis of Count-Sketch, showing an improvement in variance to $O(min|v|2/s2,|v|2/s2)$ when $t > 1$.
This finding suggests that the Feature Hashing method, which is essentially identical to CountSketch but does not make use of the median, can be made
- Score: 18.85876632959721
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we revisit the classic CountSketch method, which is a sparse,
random projection that transforms a (high-dimensional) Euclidean vector $v$ to
a vector of dimension $(2t-1) s$, where $t, s > 0$ are integer parameters. It
is known that even for $t=1$, a CountSketch allows estimating coordinates of
$v$ with variance bounded by $\|v\|_2^2/s$. For $t > 1$, the estimator takes
the median of $2t-1$ independent estimates, and the probability that the
estimate is off by more than $2 \|v\|_2/\sqrt{s}$ is exponentially small in
$t$. This suggests choosing $t$ to be logarithmic in a desired inverse failure
probability. However, implementations of CountSketch often use a small,
constant $t$. Previous work only predicts a constant factor improvement in this
setting.
Our main contribution is a new analysis of Count-Sketch, showing an
improvement in variance to $O(\min\{\|v\|_1^2/s^2,\|v\|_2^2/s\})$ when $t > 1$.
That is, the variance decreases proportionally to $s^{-2}$, asymptotically for
large enough $s$. We also study the variance in the setting where an inner
product is to be estimated from two CountSketches. This finding suggests that
the Feature Hashing method, which is essentially identical to CountSketch but
does not make use of the median estimator, can be made more reliable at a small
cost in settings where using a median estimator is possible.
We confirm our theoretical findings in experiments and thereby help justify
why a small constant number of estimates often suffice in practice. Our
improved variance bounds are based on new general theorems about the variance
and higher moments of the median of i.i.d. random variables that may be of
independent interest.
Related papers
- Near-Optimal Mean Estimation with Unknown, Heteroskedastic Variances [15.990720051907864]
Subset-of-Signals model serves as a benchmark for heteroskedastic mean estimation.
Our algorithm resolves this open question up to logarithmic factors.
Even for $d=2$, our techniques enable rates comparable to knowing the variance of each sample.
arXiv Detail & Related papers (2023-12-05T01:13:10Z) - Data Structures for Density Estimation [66.36971978162461]
Given a sublinear (in $n$) number of samples from $p$, our main result is the first data structure that identifies $v_i$ in time sublinear in $k$.
We also give an improved version of the algorithm of Acharya et al. that reports $v_i$ in time linear in $k$.
arXiv Detail & Related papers (2023-06-20T06:13:56Z) - List-Decodable Sparse Mean Estimation [7.594050968868919]
A surge of recent research interest has been focusing on the list-decodable setting where $alpha in (0, frac12]$, and the goal is to output a finite number of estimates among which at least one approximates the target mean.
In this paper, we consider that the underlying target is the mean distribution $k$ and the first contribution is the first sample $Obig(mathrmpoly(k, log dbig)$, i.e. poly-logarithmic in the dimension dimension.
arXiv Detail & Related papers (2022-05-28T05:13:45Z) - Variance-Aware Sparse Linear Bandits [64.70681598741417]
Worst-case minimax regret for sparse linear bandits is $widetildeThetaleft(sqrtdTright)$.
In the benign setting where there is no noise and the action set is the unit sphere, one can use divide-and-conquer to achieve an $widetildemathcal O(1)$ regret.
We develop a general framework that converts any variance-aware linear bandit algorithm to a variance-aware algorithm for sparse linear bandits.
arXiv Detail & Related papers (2022-05-26T15:55:44Z) - Weighted-average quantile regression [1.0742675209112622]
We introduce the weighted-average quantile regression framework, $int_Y|X(u)psi(u)du = X'beta$, where $Y$ is a dependent variable.
We develop an estimator of the vector of parameters $beta$, where $T$ is the size of available sample.
arXiv Detail & Related papers (2022-03-06T19:06:53Z) - TURF: A Two-factor, Universal, Robust, Fast Distribution Learning
Algorithm [64.13217062232874]
One of its most powerful and successful modalities approximates every distribution to an $ell$ distance essentially at most a constant times larger than its closest $t$-piece degree-$d_$.
We provide a method that estimates this number near-optimally, hence helps approach the best possible approximation.
arXiv Detail & Related papers (2022-02-15T03:49:28Z) - Variance-Aware Confidence Set: Variance-Dependent Bound for Linear
Bandits and Horizon-Free Bound for Linear Mixture MDP [76.94328400919836]
We show how to construct variance-aware confidence sets for linear bandits and linear mixture Decision Process (MDP)
For linear bandits, we obtain an $widetildeO(mathrmpoly(d)sqrt1 + sum_i=1Ksigma_i2) regret bound, where $d is the feature dimension.
For linear mixture MDP, we obtain an $widetildeO(mathrmpoly(d)sqrtK)$ regret bound, where
arXiv Detail & Related papers (2021-01-29T18:57:52Z) - Sparse sketches with small inversion bias [79.77110958547695]
Inversion bias arises when averaging estimates of quantities that depend on the inverse covariance.
We develop a framework for analyzing inversion bias, based on our proposed concept of an $(epsilon,delta)$-unbiased estimator for random matrices.
We show that when the sketching matrix $S$ is dense and has i.i.d. sub-gaussian entries, the estimator $(epsilon,delta)$-unbiased for $(Atop A)-1$ with a sketch of size $m=O(d+sqrt d/
arXiv Detail & Related papers (2020-11-21T01:33:15Z) - List-Decodable Mean Estimation in Nearly-PCA Time [50.79691056481693]
We study the fundamental task of list-decodable mean estimation in high dimensions.
Our algorithm runs in time $widetildeO(ndk)$ for all $k = O(sqrtd) cup Omega(d)$, where $n$ is the size of the dataset.
A variant of our algorithm has runtime $widetildeO(ndk)$ for all $k$, at the expense of an $O(sqrtlog k)$ factor in the recovery guarantee
arXiv Detail & Related papers (2020-11-19T17:21:37Z) - Optimal Sub-Gaussian Mean Estimation in $\mathbb{R}$ [5.457150493905064]
We present a novel estimator with sub-Gaussian convergence.
Our estimator does not require prior knowledge of the variance.
Our estimator construction and analysis gives a framework generalizable to other problems.
arXiv Detail & Related papers (2020-11-17T02:47:24Z) - Multivariate mean estimation with direction-dependent accuracy [8.147652597876862]
We consider the problem of estimating the mean of a random vector based on $N$ independent, identically distributed observations.
We prove an estimator that has a near-optimal error in all directions in which the variance of the one dimensional marginal of the random vector is not too small.
arXiv Detail & Related papers (2020-10-22T17:52:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.