Related papers: Partitioned Learned Bloom Filter

Partitioned Learned Bloom Filter

URL: http://arxiv.org/abs/2006.03176v2
Date: Sun, 4 Oct 2020 15:15:17 GMT
Title: Partitioned Learned Bloom Filter
Authors: Kapil Vaidya, Eric Knorr, Tim Kraska, Michael Mitzenmacher
Abstract summary: We show how to frame the problem of optimal model utilization as an optimization problem. We derive algorithms that can achieve near-optimal performance in many cases.
Score: 31.748077944821315
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Bloom filters are space-efficient probabilistic data structures that are used to test whether an element is a member of a set, and may return false positives. Recently, variations referred to as learned Bloom filters were developed that can provide improved performance in terms of the rate of false positives, by using a learned model for the represented set. However, previous methods for learned Bloom filters do not take full advantage of the learned model. Here we show how to frame the problem of optimal model utilization as an optimization problem, and using our framework derive algorithms that can achieve near-optimal performance in many cases. Experimental results from both simulated and real-world datasets show significant performance improvements from our optimization approach over both the original learned Bloom filter constructions and previously proposed heuristic improvements.

Related papers

Recommendations from Sparse Comparison Data: Provably Fast Convergence for Nonconvex Matrix Factorization [12.006706388840934]
This paper provides a theoretical analysis of a new learning problem for recommender systems where users provide feedback by comparing pairs of items instead of rating them individually.
arXiv Detail & Related papers (2025-02-27T12:17:34Z)
Permutation Invariant Learning with High-Dimensional Particle Filters [8.878254892409005]
Sequential learning in deep models often suffers from challenges such as catastrophic forgetting and loss of plasticity. We introduce a novel permutation-invariant learning framework based on high-dimensional particle filters.
arXiv Detail & Related papers (2024-10-30T05:06:55Z)
Discovering Preference Optimization Algorithms with and for Large Language Models [50.843710797024805]
offline preference optimization is a key method for enhancing and controlling the quality of Large Language Model (LLM) outputs. We perform objective discovery to automatically discover new state-of-the-art preference optimization algorithms without (expert) human intervention. Experiments demonstrate the state-of-the-art performance of DiscoPOP, a novel algorithm that adaptively blends logistic and exponential losses.
arXiv Detail & Related papers (2024-06-12T16:58:41Z)
Functional Graphical Models: Structure Enables Offline Data-Driven Optimization [111.28605744661638]
We show how structure can enable sample-efficient data-driven optimization. We also present a data-driven optimization algorithm that infers the FGM structure itself.
arXiv Detail & Related papers (2024-01-08T22:33:14Z)
Data-driven Prior Learning for Bayesian Optimisation [5.199765487172328]
We show that PLeBO and prior transfer find good inputs in fewer evaluations. We validate the learned priors and compare to a breadth of transfer learning approaches. We show that PLeBO and prior transfer find good inputs in fewer evaluations.
arXiv Detail & Related papers (2023-11-24T18:37:52Z)
A Critical Analysis of Classifier Selection in Learned Bloom Filters [0.3359875577705538]
"Complexity" of the data used to build the filter might heavily impact on its performance. We propose a novel methodology, supported by software, for designing, analyzing and implementing Learned Bloom Filters. Experiments show that the proposed methodology and the supporting software are valid and useful.
arXiv Detail & Related papers (2022-11-28T17:17:18Z)
Efficient Non-Parametric Optimizer Search for Diverse Tasks [93.64739408827604]
We present the first efficient scalable and general framework that can directly search on the tasks of interest. Inspired by the innate tree structure of the underlying math expressions, we re-arrange the spaces into a super-tree. We adopt an adaptation of the Monte Carlo method to tree search, equipped with rejection sampling and equivalent- form detection.
arXiv Detail & Related papers (2022-09-27T17:51:31Z)
Compressing (Multidimensional) Learned Bloom Filters [7.6058140480517356]
A Bloom filter reveals if an element is not included in the underlying set or is included with a certain error rate. Deep learning models are used to solve this membership testing problem. We show that the benefits of learned Bloom filters are apparent only when considering a vast amount of data.
arXiv Detail & Related papers (2022-08-05T07:54:48Z)
Optimizer Amalgamation [124.33523126363728]
We are motivated to study a new problem named Amalgamation: how can we best combine a pool of "teacher" amalgamations into a single "student" that can have stronger problem-specific performance? First, we define three differentiable mechanisms to amalgamate a pool of analyticals by gradient descent. In order to reduce variance of the process, we also explore methods to stabilize the process by perturbing the target.
arXiv Detail & Related papers (2022-03-12T16:07:57Z)
Contextual Bayesian optimization with binary outputs [0.5076419064097732]
In many real-world situations, the objective function can be evaluated in controlled 'contexts' or 'environments' that directly influence the observations. Here we combine ideas from Bayesian active learning and optimization to efficiently choose the best context and optimization parameter on each iteration. We demonstrate the performance of our algorithm and illustrate how it can be used to tackle a concrete application in visual psychophysics.
arXiv Detail & Related papers (2021-11-05T12:09:46Z)
LQF: Linear Quadratic Fine-Tuning [114.3840147070712]
We present the first method for linearizing a pre-trained model that achieves comparable performance to non-linear fine-tuning. LQF consists of simple modifications to the architecture, loss function and optimization typically used for classification.
arXiv Detail & Related papers (2020-12-21T06:40:20Z)
Fast Rates for Contextual Linear Optimization [52.39202699484225]
We show that a naive plug-in approach achieves regret convergence rates that are significantly faster than methods that directly optimize downstream decision performance. Our results are overall positive for practice: predictive models are easy and fast to train using existing tools, simple to interpret, and, as we show, lead to decisions that perform very well.
arXiv Detail & Related papers (2020-11-05T18:43:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.