Related papers: Bloom Filter Encoding for Machine Learning

Bloom Filter Encoding for Machine Learning

URL: http://arxiv.org/abs/2512.19991v1
Date: Tue, 23 Dec 2025 02:33:57 GMT
Title: Bloom Filter Encoding for Machine Learning
Authors: John Cartmell, Mihaela Cardei, Ionut Cardei,
Abstract summary: We present a method that uses the Bloom filter transform to preprocess data for machine learning.<n>Each sample is encoded into a compact, privacy-preserving bit array.<n>We test the method on six datasets: SMS Spam Collection, ECG200, Adult 50K, CDC Diabetes, MNIST, and Fashion MNIST.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present a method that uses the Bloom filter transform to preprocess data for machine learning. Each sample is encoded into a compact, privacy-preserving bit array. This reduces memory use and protects the original data while keeping enough structure for accurate classification. We test the method on six datasets: SMS Spam Collection, ECG200, Adult 50K, CDC Diabetes, MNIST, and Fashion MNIST. Four classifiers are used: Extreme Gradient Boosting, Deep Neural Networks, Convolutional Neural Networks, and Logistic Regression. Results show that models trained on Bloom filter encodings achieve accuracy similar to models trained on raw data or other transforms. At the same time, the method provides memory savings while enhancing privacy. These results suggest that the Bloom filter transform is an efficient preprocessing approach for diverse machine learning tasks.

Related papers

Self-Supervised Learning via Flow-Guided Neural Operator on Time-Series Data [57.85958428020496]
Flow-Guided Neural Operator (FGNO) is a novel framework combining operator learning with flow matching for SSL training.<n>FGNO learns mappings in functional spaces by using Short-Time Fourier Transform to unify different time resolutions.<n>Unlike prior generative SSL methods that use noisy inputs during inference, we propose using clean inputs for representation extraction while learning representations with noise.
arXiv Detail & Related papers (2026-02-12T18:54:57Z)
Learned LSM-trees: Two Approaches Using Learned Bloom Filters [0.0]
Key-value stores rely heavily on Log-Structured Merge (LSM) trees for write optimization.<n> Auxiliary structures like Bloom filters help, but impose memory costs that scale with tree depth and dataset size.<n>Recent advances in learned data structures suggest that machine learning models can augment or replace these components.
arXiv Detail & Related papers (2025-07-24T04:23:52Z)
Private Training & Data Generation by Clustering Embeddings [74.00687214400021]
Differential privacy (DP) provides a robust framework for protecting individual data.<n>We introduce a novel principled method for DP synthetic image embedding generation.<n> Empirically, a simple two-layer neural network trained on synthetically generated embeddings achieves state-of-the-art (SOTA) classification accuracy.
arXiv Detail & Related papers (2025-06-20T00:17:14Z)
Learning from Convolution-based Unlearnable Datasets [5.332412565926725]
The Conlearn-based Unlearnable DAtaset (CUDA) method aims to make data unlearnable by applying class-wise blurs to every image in the dataset.<n>In this work, we evaluate whether data remains unlearnable after image sharpening and frequency filtering.<n>We observe a substantial increase in test accuracy over adversarial training for models trained with unlearnable data.
arXiv Detail & Related papers (2024-11-04T01:51:50Z)
SIRST-5K: Exploring Massive Negatives Synthesis with Self-supervised Learning for Robust Infrared Small Target Detection [53.19618419772467]
Single-frame infrared small target (SIRST) detection aims to recognize small targets from clutter backgrounds. With the development of Transformer, the scale of SIRST models is constantly increasing. With a rich diversity of infrared small target data, our algorithm significantly improves the model performance and convergence speed.
arXiv Detail & Related papers (2024-03-08T16:14:54Z)
Filter Pruning for Efficient CNNs via Knowledge-driven Differential Filter Sampler [103.97487121678276]
Filter pruning simultaneously accelerates the computation and reduces the memory overhead of CNNs. We propose a novel Knowledge-driven Differential Filter Sampler(KDFS) with Masked Filter Modeling(MFM) framework for filter pruning.
arXiv Detail & Related papers (2023-07-01T02:28:41Z)
Compressing (Multidimensional) Learned Bloom Filters [7.6058140480517356]
A Bloom filter reveals if an element is not included in the underlying set or is included with a certain error rate. Deep learning models are used to solve this membership testing problem. We show that the benefits of learned Bloom filters are apparent only when considering a vast amount of data.
arXiv Detail & Related papers (2022-08-05T07:54:48Z)
Weakly Supervised Change Detection Using Guided Anisotropic Difusion [97.43170678509478]
We propose original ideas that help us to leverage such datasets in the context of change detection. First, we propose the guided anisotropic diffusion (GAD) algorithm, which improves semantic segmentation results. We then show its potential in two weakly-supervised learning strategies tailored for change detection.
arXiv Detail & Related papers (2021-12-31T10:03:47Z)
Cherry-Picking Gradients: Learning Low-Rank Embeddings of Visual Data via Differentiable Cross-Approximation [53.95297550117153]
We propose an end-to-end trainable framework that processes large-scale visual data tensors by looking emphat a fraction of their entries only. The proposed approach is particularly useful for large-scale multidimensional grid data, and for tasks that require context over a large receptive field.
arXiv Detail & Related papers (2021-05-29T08:39:57Z)
Ensemble Wrapper Subsampling for Deep Modulation Classification [70.91089216571035]
Subsampling of received wireless signals is important for relaxing hardware requirements as well as the computational cost of signal processing algorithms. We propose a subsampling technique to facilitate the use of deep learning for automatic modulation classification in wireless communication systems.
arXiv Detail & Related papers (2020-05-10T06:11:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.