Daisy Bloom Filters
- URL: http://arxiv.org/abs/2205.14894v2
- Date: Sun, 16 Jun 2024 15:29:00 GMT
- Title: Daisy Bloom Filters
- Authors: Ioana O. Bercea, Jakob Bæk Tejs Houen, Rasmus Pagh,
- Abstract summary: A filter is a widely used data structure for storing an approximation of a given set $S$ of elements from some universe $U$ (a countable set)
The advantage of using a Bloom filter, when some false positives are acceptable, is that the space usage becomes smaller than what is required to store $S$ exactly.
We present a Bloom filter alternative, which we call the $textitDaisy Bloom filter$, that executes operations faster and uses significantly less space than the standard Bloom filter.
- Score: 10.428888893980739
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A filter is a widely used data structure for storing an approximation of a given set $S$ of elements from some universe $U$ (a countable set).It represents a superset $S'\supseteq S$ that is ''close to $S$'' in the sense that for $x\not\in S$, the probability that $x\in S'$ is bounded by some $\varepsilon > 0$. The advantage of using a Bloom filter, when some false positives are acceptable, is that the space usage becomes smaller than what is required to store $S$ exactly. Though filters are well-understood from a worst-case perspective, it is clear that state-of-the-art constructions may not be close to optimal for particular distributions of data and queries. Suppose, for instance, that some elements are in $S$ with probability close to 1. Then it would make sense to always include them in $S'$, saving space by not having to represent these elements in the filter. Questions like this have been raised in the context of Weighted Bloom filters (Bruck, Gao and Jiang, ISIT 2006) and Bloom filter implementations that make use of access to learned components (Vaidya, Knorr, Mitzenmacher, and Krask, ICLR 2021). In this paper, we present a lower bound for the expected space that such a filter requires. We also show that the lower bound is asymptotically tight by exhibiting a filter construction that executes queries and insertions in worst-case constant time, and has a false positive rate at most $\varepsilon $ with high probability over input sets drawn from a product distribution. We also present a Bloom filter alternative, which we call the $\textit{Daisy Bloom filter}$, that executes operations faster and uses significantly less space than the standard Bloom filter.
Related papers
- Enjoying Non-linearity in Multinomial Logistic Bandits [66.36004256710824]
We consider the multinomial logistic bandit problem, a variant of generalized linear bandits where a learner interacts with an environment.<n>We extend the definition of $kappa_*$ to the multinomial setting and propose an efficient algorithm that leverages the problem's non-linearity.<n>Our method yields a problem-dependent regret bound of order $ smashwidetildemathcalO( Kd sqrtT/kappa_*) $, where $K$ is the number of actions and $kappa_*
arXiv Detail & Related papers (2025-07-07T08:18:25Z) - Differentially Private Space-Efficient Algorithms for Counting Distinct Elements in the Turnstile Model [61.40326886123332]
We give the first sublinear space differentially private algorithms for the fundamental problem of counting distinct elements in the turnstile streaming model.<n>Our result significantly improves upon the space requirements of the state-of-the-art algorithms for this problem, which is linear.<n>When a bound $W$ on the number of times an item appears in the stream is known, our algorithm provides $tildeO_eta(sqrtW)$ additive error using $tildeO_eta(sqrtW)$ space.
arXiv Detail & Related papers (2025-05-29T17:21:20Z) - An Uncertainty Principle for Linear Recurrent Neural Networks [54.13281679205581]
We build a linear filter of order $S$ that approximates the filter that looks $K$ time steps in the past.
We fully characterize the problem by providing lower bounds of approximation, as well as explicit filters that achieve this lower bound up to constants.
The optimal performance highlights an uncertainty principle: the filter has to average values around the $K$-th time step in the past with a range(width) that is proportional to $K/S$.
arXiv Detail & Related papers (2025-02-13T13:01:46Z) - Adversarially Robust Bloom Filters: Privacy, Reductions, and Open Problems [0.0]
A Bloom filter is a space-efficient probabilistic data structure that represents a set $S$ of elements from a larger universe $U$.<n>We investigate the adversarial robustness and privacy of Bloom filters.<n>We prove that PRF-backed Bloom filters fail the NOY model's stronger BP-test.<n>We introduce the first private Bloom filters with differential privacy guarantees.
arXiv Detail & Related papers (2025-01-27T03:35:25Z) - Optimal Oblivious Subspace Embeddings with Near-optimal Sparsity [3.9657575162895196]
An oblivious subspace embedding is a random $mtimes n$ matrix $Pi$ that preserves the norms of all vectors in that subspace within a $1pmepsilon$ factor.
We give an oblivious subspace embedding with the optimal dimension $m=Theta(d/epsilon2)$ that has a near-optimal sparsity of $tilde O (1/epsilon)$ non-zero entries per column of $Pi$.
arXiv Detail & Related papers (2024-11-13T16:58:51Z) - Misspecified $Q$-Learning with Sparse Linear Function Approximation: Tight Bounds on Approximation Error [25.777423855881878]
We show a novel elimination-based algorithm to show one can obtain an $Oleft(Hepsilonright)$-optimal policy.
We complement our upper bound with an $widetildeOmegaleft(Hepsilonright)$-optimality lower bound, giving a complete picture of this problem.
arXiv Detail & Related papers (2024-07-18T15:58:04Z) - Group-invariant max filtering [4.396860522241306]
We construct a family of $G$-invariant real-valued functions on $V$ that we call max filters.
In the case where $V=mathbbRd$ and $G$ is finite, a suitable max filter bank separates orbits, and is even bilipschitz in the quotient metric.
arXiv Detail & Related papers (2022-05-27T15:18:08Z) - One-Sided Box Filter for Edge Preserving Image Smoothing [9.67565473617028]
We present a one-sided box filter that can smooth the signal but keep the discontinuous features in the signal.
More specifically, we perform box filter on eight one-sided windows, leading to a one-sided box filter that can preserve corners and edges.
arXiv Detail & Related papers (2021-08-11T04:22:38Z) - Optimal Spectral Recovery of a Planted Vector in a Subspace [80.02218763267992]
We study efficient estimation and detection of a planted vector $v$ whose $ell_4$ norm differs from that of a Gaussian vector with the same $ell$ norm.
We show that in the regime $n rho gg sqrtN$, any spectral method from a large class (and more generally, any low-degree of the input) fails to detect the planted vector.
arXiv Detail & Related papers (2021-05-31T16:10:49Z) - Learning a Latent Simplex in Input-Sparsity Time [58.30321592603066]
We consider the problem of learning a latent $k$-vertex simplex $KsubsetmathbbRdtimes n$, given access to $AinmathbbRdtimes n$.
We show that the dependence on $k$ in the running time is unnecessary given a natural assumption about the mass of the top $k$ singular values of $A$.
arXiv Detail & Related papers (2021-05-17T16:40:48Z) - Stochastic Linear Bandits with Protected Subspace [51.43660657268171]
We study a variant of the linear bandit problem wherein we optimize a linear objective function but rewards are accrued only to an unknown subspace.
In particular, at each round, the learner must choose whether to query the objective or the protected subspace alongside choosing an action.
Our algorithm, derived from the OFUL principle, uses some of the queries to get an estimate of the protected space.
arXiv Detail & Related papers (2020-11-02T14:59:39Z) - Data Agnostic Filter Gating for Efficient Deep Networks [72.4615632234314]
Current filter pruning methods mainly leverage feature maps to generate important scores for filters and prune those with smaller scores.
In this paper, we propose a data filter pruning method that uses an auxiliary network named Dagger module to induce pruning.
In addition, to help prune filters with certain FLOPs constraints, we leverage an explicit FLOPs-aware regularization to directly promote pruning filters toward target FLOPs.
arXiv Detail & Related papers (2020-10-28T15:26:40Z) - Pruning Filter in Filter [38.6403556260338]
Pruning has become a very powerful and effective technique to compress and accelerate modern neural networks.
We propose to prune the filter in the filter to achieve finer granularity than traditional filter pruning methods.
We demonstrate that SWP is more effective compared to the previous FP-based methods and achieves the state-of-art pruning ratio on CIFAR-10 and ImageNet datasets.
arXiv Detail & Related papers (2020-09-30T03:35:16Z) - Filter Grafting for Deep Neural Networks: Reason, Method, and
Cultivation [86.91324735966766]
Filter is the key component in modern convolutional neural networks (CNNs)
In this paper, we introduce filter grafting (textbfMethod) to achieve this goal.
We develop a novel criterion to measure the information of filters and an adaptive weighting strategy to balance the grafted information among networks.
arXiv Detail & Related papers (2020-04-26T08:36:26Z) - Agnostic Q-learning with Function Approximation in Deterministic
Systems: Tight Bounds on Approximation Error and Sample Complexity [94.37110094442136]
We study the problem of agnostic $Q$-learning with function approximation in deterministic systems.
We show that if $delta = Oleft(rho/sqrtdim_Eright)$, then one can find the optimal policy using $Oleft(dim_Eright)$.
arXiv Detail & Related papers (2020-02-17T18:41:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.