A Critical Analysis of Classifier Selection in Learned Bloom Filters
- URL: http://arxiv.org/abs/2211.15565v1
- Date: Mon, 28 Nov 2022 17:17:18 GMT
- Title: A Critical Analysis of Classifier Selection in Learned Bloom Filters
- Authors: Dario Malchiodi, Davide Raimondi, Giacomo Fumagalli, Raffaele
Giancarlo, Marco Frasca
- Abstract summary: "Complexity" of the data used to build the filter might heavily impact on its performance.
We propose a novel methodology, supported by software, for designing, analyzing and implementing Learned Bloom Filters.
Experiments show that the proposed methodology and the supporting software are valid and useful.
- Score: 0.3359875577705538
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learned Bloom Filters, i.e., models induced from data via machine learning
techniques and solving the approximate set membership problem, have recently
been introduced with the aim of enhancing the performance of standard Bloom
Filters, with special focus on space occupancy. Unlike in the classical case,
the "complexity" of the data used to build the filter might heavily impact on
its performance. Therefore, here we propose the first in-depth analysis, to the
best of our knowledge, for the performance assessment of a given Learned Bloom
Filter, in conjunction with a given classifier, on a dataset of a given
classification complexity. Indeed, we propose a novel methodology, supported by
software, for designing, analyzing and implementing Learned Bloom Filters in
function of specific constraints on their multi-criteria nature (that is,
constraints involving space efficiency, false positive rate, and reject time).
Our experiments show that the proposed methodology and the supporting software
are valid and useful: we find out that only two classifiers have desirable
properties in relation to problems with different data complexity, and,
interestingly, none of them has been considered so far in the literature. We
also experimentally show that the Sandwiched variant of Learned Bloom filters
is the most robust to data complexity and classifier performance variability,
as well as those usually having smaller reject times. The software can be
readily used to test new Learned Bloom Filter proposals, which can be compared
with the best ones identified here.
Related papers
- A Contrast Based Feature Selection Algorithm for High-dimensional Data
set in Machine Learning [9.596923373834093]
We propose a novel filter feature selection method, ContrastFS, which selects discriminative features based on the discrepancies features shown between different classes.
We validate effectiveness and efficiency of our approach on several widely studied benchmark datasets, results show that the new method performs favorably with negligible computation.
arXiv Detail & Related papers (2024-01-15T05:32:35Z) - Finding Support Examples for In-Context Learning [73.90376920653507]
We propose LENS, a fiLter-thEN-Search method to tackle this challenge in two stages.
First we filter the dataset to obtain informative in-context examples individually.
Then we propose diversity-guided example search which iteratively refines and evaluates the selected example permutations.
arXiv Detail & Related papers (2023-02-27T06:32:45Z) - Compressing (Multidimensional) Learned Bloom Filters [7.6058140480517356]
A Bloom filter reveals if an element is not included in the underlying set or is included with a certain error rate.
Deep learning models are used to solve this membership testing problem.
We show that the benefits of learned Bloom filters are apparent only when considering a vast amount of data.
arXiv Detail & Related papers (2022-08-05T07:54:48Z) - Broad Recommender System: An Efficient Nonlinear Collaborative Filtering
Approach [56.12815715932561]
We propose a new broad recommender system called Broad Collaborative Filtering (BroadCF)
Instead of Deep Neural Networks (DNNs), Broad Learning System (BLS) is used as a mapping function to learn the complex nonlinear relationships between users and items.
Extensive experiments conducted on seven benchmark datasets have confirmed the effectiveness of the proposed BroadCF algorithm.
arXiv Detail & Related papers (2022-04-20T01:25:08Z) - Compactness Score: A Fast Filter Method for Unsupervised Feature
Selection [66.84571085643928]
We propose a fast unsupervised feature selection method, named as, Compactness Score (CSUFS) to select desired features.
Our proposed algorithm seems to be more accurate and efficient compared with existing algorithms.
arXiv Detail & Related papers (2022-01-31T13:01:37Z) - Nonnegative OPLS for Supervised Design of Filter Banks: Application to
Image and Audio Feature Extraction [0.0]
We propose a methodology to design filter banks in a supervised way for applications dealing with nonnegative data.
We analyze the discriminative power of the features obtained with the proposed methods for two different and widely studied applications.
arXiv Detail & Related papers (2021-12-22T23:58:25Z) - On the Choice of General Purpose Classifiers in Learned Bloom Filters:
An Initial Analysis Within Basic Filters [0.41998444721319217]
Several versions of Bloom Filters have been considered, yielding advantages over classic Filters.
Each of them uses a classifier, which is the Learned part of the data structure.
No systematic study of which specific classifier to use in which circumstances is available.
arXiv Detail & Related papers (2021-12-13T11:15:41Z) - Learning Versatile Convolution Filters for Efficient Visual Recognition [125.34595948003745]
This paper introduces versatile filters to construct efficient convolutional neural networks.
We conduct theoretical analysis on network complexity and an efficient convolution scheme is introduced.
Experimental results on benchmark datasets and neural networks demonstrate that our versatile filters are able to achieve comparable accuracy as that of original filters.
arXiv Detail & Related papers (2021-09-20T06:07:14Z) - Active Learning from Crowd in Document Screening [76.9545252341746]
We focus on building a set of machine learning classifiers that evaluate documents, and then screen them efficiently.
We propose a multi-label active learning screening specific sampling technique -- objective-aware sampling.
We demonstrate that objective-aware sampling significantly outperforms the state of the art active learning sampling strategies.
arXiv Detail & Related papers (2020-11-11T16:17:28Z) - Bloom Origami Assays: Practical Group Testing [90.2899558237778]
Group testing is a well-studied problem with several appealing solutions.
Recent biological studies impose practical constraints for COVID-19 that are incompatible with traditional methods.
We develop a new method combining Bloom filters with belief propagation to scale to larger values of n (more than 100) with good empirical results.
arXiv Detail & Related papers (2020-07-21T19:31:41Z) - Partitioned Learned Bloom Filter [31.748077944821315]
We show how to frame the problem of optimal model utilization as an optimization problem.
We derive algorithms that can achieve near-optimal performance in many cases.
arXiv Detail & Related papers (2020-06-05T00:05:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.