Partitioned Learned Bloom Filter
- URL: http://arxiv.org/abs/2006.03176v2
- Date: Sun, 4 Oct 2020 15:15:17 GMT
- Title: Partitioned Learned Bloom Filter
- Authors: Kapil Vaidya, Eric Knorr, Tim Kraska, Michael Mitzenmacher
- Abstract summary: We show how to frame the problem of optimal model utilization as an optimization problem.
We derive algorithms that can achieve near-optimal performance in many cases.
- Score: 31.748077944821315
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Bloom filters are space-efficient probabilistic data structures that are used
to test whether an element is a member of a set, and may return false
positives. Recently, variations referred to as learned Bloom filters were
developed that can provide improved performance in terms of the rate of false
positives, by using a learned model for the represented set. However, previous
methods for learned Bloom filters do not take full advantage of the learned
model. Here we show how to frame the problem of optimal model utilization as an
optimization problem, and using our framework derive algorithms that can
achieve near-optimal performance in many cases. Experimental results from both
simulated and real-world datasets show significant performance improvements
from our optimization approach over both the original learned Bloom filter
constructions and previously proposed heuristic improvements.
Related papers
- Permutation Invariant Learning with High-Dimensional Particle Filters [8.878254892409005]
Sequential learning in deep models often suffers from challenges such as catastrophic forgetting and loss of plasticity.
We introduce a novel permutation-invariant learning framework based on high-dimensional particle filters.
arXiv Detail & Related papers (2024-10-30T05:06:55Z) - Discovering Preference Optimization Algorithms with and for Large Language Models [50.843710797024805]
offline preference optimization is a key method for enhancing and controlling the quality of Large Language Model (LLM) outputs.
We perform objective discovery to automatically discover new state-of-the-art preference optimization algorithms without (expert) human intervention.
Experiments demonstrate the state-of-the-art performance of DiscoPOP, a novel algorithm that adaptively blends logistic and exponential losses.
arXiv Detail & Related papers (2024-06-12T16:58:41Z) - Functional Graphical Models: Structure Enables Offline Data-Driven Optimization [111.28605744661638]
We show how structure can enable sample-efficient data-driven optimization.
We also present a data-driven optimization algorithm that infers the FGM structure itself.
arXiv Detail & Related papers (2024-01-08T22:33:14Z) - Data-driven Prior Learning for Bayesian Optimisation [5.199765487172328]
We show that PLeBO and prior transfer find good inputs in fewer evaluations.
We validate the learned priors and compare to a breadth of transfer learning approaches.
We show that PLeBO and prior transfer find good inputs in fewer evaluations.
arXiv Detail & Related papers (2023-11-24T18:37:52Z) - A Critical Analysis of Classifier Selection in Learned Bloom Filters [0.3359875577705538]
"Complexity" of the data used to build the filter might heavily impact on its performance.
We propose a novel methodology, supported by software, for designing, analyzing and implementing Learned Bloom Filters.
Experiments show that the proposed methodology and the supporting software are valid and useful.
arXiv Detail & Related papers (2022-11-28T17:17:18Z) - Efficient Non-Parametric Optimizer Search for Diverse Tasks [93.64739408827604]
We present the first efficient scalable and general framework that can directly search on the tasks of interest.
Inspired by the innate tree structure of the underlying math expressions, we re-arrange the spaces into a super-tree.
We adopt an adaptation of the Monte Carlo method to tree search, equipped with rejection sampling and equivalent- form detection.
arXiv Detail & Related papers (2022-09-27T17:51:31Z) - Compressing (Multidimensional) Learned Bloom Filters [7.6058140480517356]
A Bloom filter reveals if an element is not included in the underlying set or is included with a certain error rate.
Deep learning models are used to solve this membership testing problem.
We show that the benefits of learned Bloom filters are apparent only when considering a vast amount of data.
arXiv Detail & Related papers (2022-08-05T07:54:48Z) - Optimizer Amalgamation [124.33523126363728]
We are motivated to study a new problem named Amalgamation: how can we best combine a pool of "teacher" amalgamations into a single "student" that can have stronger problem-specific performance?
First, we define three differentiable mechanisms to amalgamate a pool of analyticals by gradient descent.
In order to reduce variance of the process, we also explore methods to stabilize the process by perturbing the target.
arXiv Detail & Related papers (2022-03-12T16:07:57Z) - Contextual Bayesian optimization with binary outputs [0.5076419064097732]
In many real-world situations, the objective function can be evaluated in controlled 'contexts' or 'environments' that directly influence the observations.
Here we combine ideas from Bayesian active learning and optimization to efficiently choose the best context and optimization parameter on each iteration.
We demonstrate the performance of our algorithm and illustrate how it can be used to tackle a concrete application in visual psychophysics.
arXiv Detail & Related papers (2021-11-05T12:09:46Z) - LQF: Linear Quadratic Fine-Tuning [114.3840147070712]
We present the first method for linearizing a pre-trained model that achieves comparable performance to non-linear fine-tuning.
LQF consists of simple modifications to the architecture, loss function and optimization typically used for classification.
arXiv Detail & Related papers (2020-12-21T06:40:20Z) - Fast Rates for Contextual Linear Optimization [52.39202699484225]
We show that a naive plug-in approach achieves regret convergence rates that are significantly faster than methods that directly optimize downstream decision performance.
Our results are overall positive for practice: predictive models are easy and fast to train using existing tools, simple to interpret, and, as we show, lead to decisions that perform very well.
arXiv Detail & Related papers (2020-11-05T18:43:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.