WOR and $p$'s: Sketches for $\ell_p$-Sampling Without Replacement
- URL: http://arxiv.org/abs/2007.06744v3
- Date: Sat, 15 Aug 2020 06:12:25 GMT
- Title: WOR and $p$'s: Sketches for $\ell_p$-Sampling Without Replacement
- Authors: Edith Cohen, Rasmus Pagh, David P. Woodruff
- Abstract summary: We design novel composable sketches for WOR $ell_p$ sampling.
Our sketches have size that grows only linearly with the sample size.
Our method is the first to provide WOR sampling in the important regime of $p>1$ and the first to handle signed updates for $p>0$.
- Score: 75.12782480740822
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Weighted sampling is a fundamental tool in data analysis and machine learning
pipelines. Samples are used for efficient estimation of statistics or as sparse
representations of the data. When weight distributions are skewed, as is often
the case in practice, without-replacement (WOR) sampling is much more effective
than with-replacement (WR) sampling: it provides a broader representation and
higher accuracy for the same number of samples. We design novel composable
sketches for WOR $\ell_p$ sampling, weighted sampling of keys according to a
power $p\in[0,2]$ of their frequency (or for signed data, sum of updates). Our
sketches have size that grows only linearly with the sample size. Our design is
simple and practical, despite intricate analysis, and based on off-the-shelf
use of widely implemented heavy hitters sketches such as CountSketch. Our
method is the first to provide WOR sampling in the important regime of $p>1$
and the first to handle signed updates for $p>0$.
Related papers
- Mini-batch Submodular Maximization [5.439020425819001]
We present the first mini-batch algorithm for maximizing a monotone decomposable submodular function, $F=sum_i=1N fi$, under a set of constraints.
We consider two sampling approaches: uniform and weighted.
Surprisingly, our experimental results show that uniform sampling is superior to weighted sampling.
arXiv Detail & Related papers (2024-01-23T04:16:58Z) - Massively Parallel Reweighted Wake-Sleep [29.436464740855598]
Re-weighted wake-sleep (RWS) is a machine learning method for performing Bayesian inference in a very general class of models.
Recent work indicates that the number of samples required for effective importance weighting is exponential in the number of latent variables.
We show considerable improvements over standard "global" RWS, which draws $K$ samples from the full joint.
arXiv Detail & Related papers (2023-05-18T15:03:56Z) - Reinforcement Learning Enhanced Weighted Sampling for Accurate Subgraph
Counting on Fully Dynamic Graph Streams [35.943447765433774]
We propose a weighted sampling algorithm called WSD for estimating the subgraph count in a fully dynamic graph stream.
We determine the weights of edges in a data-driven fashion, using a novel method based on reinforcement learning.
arXiv Detail & Related papers (2022-11-13T03:01:34Z) - Bias Mimicking: A Simple Sampling Approach for Bias Mitigation [57.17709477668213]
We introduce a new class-conditioned sampling method: Bias Mimicking.
Bias Mimicking improves underrepresented groups' accuracy of sampling methods by 3% over four benchmarks.
arXiv Detail & Related papers (2022-09-30T17:33:00Z) - Adaptive Sketches for Robust Regression with Importance Sampling [64.75899469557272]
We introduce data structures for solving robust regression through gradient descent (SGD)
Our algorithm effectively runs $T$ steps of SGD with importance sampling while using sublinear space and just making a single pass over the data.
arXiv Detail & Related papers (2022-07-16T03:09:30Z) - Sharper Rates and Flexible Framework for Nonconvex SGD with Client and
Data Sampling [64.31011847952006]
We revisit the problem of finding an approximately stationary point of the average of $n$ smooth and possibly non-color functions.
We generalize the $smallsfcolorgreen$ so that it can provably work with virtually any sampling mechanism.
We provide the most general and most accurate analysis of optimal bound in the smooth non-color regime.
arXiv Detail & Related papers (2022-06-05T21:32:33Z) - Meta-Sampler: Almost-Universal yet Task-Oriented Sampling for Point
Clouds [46.33828400918886]
We show how we can train an almost-universal meta-sampler across multiple tasks.
This meta-sampler can then be rapidly fine-tuned when applied to different datasets, networks, or even different tasks.
arXiv Detail & Related papers (2022-03-30T02:21:34Z) - Oblivious sketching for logistic regression [72.42202783677811]
We present the first data oblivious sketch for logistic regression.
Our sketches are fast, simple, easy to implement, and our experiments demonstrate their practicality.
arXiv Detail & Related papers (2021-07-14T11:29:26Z) - Learning a Unified Sample Weighting Network for Object Detection [113.98404690619982]
Region sampling or weighting is significantly important to the success of modern region-based object detectors.
We argue that sample weighting should be data-dependent and task-dependent.
We propose a unified sample weighting network to predict a sample's task weights.
arXiv Detail & Related papers (2020-06-11T16:19:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.