Sketching Datasets for Large-Scale Learning (long version)
- URL: http://arxiv.org/abs/2008.01839v3
- Date: Thu, 24 Jun 2021 21:36:36 GMT
- Title: Sketching Datasets for Large-Scale Learning (long version)
- Authors: R\'emi Gribonval and Antoine Chatalic and Nicolas Keriven and Vincent
Schellekens and Laurent Jacques and Philip Schniter
- Abstract summary: "Compressive learning" is an approach to large-scale machine learning where datasets are massively compressed before learning.
A sketch is first constructed by computing carefully chosen nonlinear random features and averaging them over the whole dataset.
Parameters are then learned from the sketch, without access to the original dataset.
- Score: 24.478376776509045
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This article considers "compressive learning," an approach to large-scale
machine learning where datasets are massively compressed before learning (e.g.,
clustering, classification, or regression) is performed. In particular, a
"sketch" is first constructed by computing carefully chosen nonlinear random
features (e.g., random Fourier features) and averaging them over the whole
dataset. Parameters are then learned from the sketch, without access to the
original dataset. This article surveys the current state-of-the-art in
compressive learning, including the main concepts and algorithms, their
connections with established signal-processing methods, existing theoretical
guarantees -- on both information preservation and privacy preservation, and
important open problems.
Related papers
- Compressive Meta-Learning [49.300635370079874]
Compressive learning is a framework that enables efficient processing by using random, non-linear features.<n>We propose a framework that meta-learns both the encoding and decoding stages of compressive learning methods.<n>We explore multiple applications -- including neural network-based compressive PCA, compressive ridge regression, compressive k-means, and autoencoders.
arXiv Detail & Related papers (2025-08-14T22:08:06Z) - ZEUS: Zero-shot Embeddings for Unsupervised Separation of Tabular Data [7.121259735505479]
ZEUS is a self-contained model capable of clustering new datasets without any additional training or fine-tuning.<n>It operates by decomposing complex datasets into meaningful components that can then be clustered effectively.
arXiv Detail & Related papers (2025-05-15T20:52:26Z) - Continual Learning for Multimodal Data Fusion of a Soft Gripper [1.0589208420411014]
A model trained on one data modality often fails when tested with a different modality.
We introduce a continual learning algorithm capable of incrementally learning different data modalities.
We evaluate the algorithm's effectiveness on a challenging custom multimodal dataset.
arXiv Detail & Related papers (2024-09-20T09:53:27Z) - Koopcon: A new approach towards smarter and less complex learning [13.053285552524052]
In the era of big data, the sheer volume and complexity of datasets pose significant challenges in machine learning.
This paper introduces an innovative Autoencoder-based dataset condensation model backed by Koopman operator theory.
Inspired by the predictive coding mechanisms of the human brain, our model leverages a novel approach to encode and reconstruct data.
arXiv Detail & Related papers (2024-05-22T17:47:14Z) - A Weighted K-Center Algorithm for Data Subset Selection [70.49696246526199]
Subset selection is a fundamental problem that can play a key role in identifying smaller portions of the training data.
We develop a novel factor 3-approximation algorithm to compute subsets based on the weighted sum of both k-center and uncertainty sampling objective functions.
arXiv Detail & Related papers (2023-12-17T04:41:07Z) - Surprisal Driven $k$-NN for Robust and Interpretable Nonparametric
Learning [1.4293924404819704]
We shed new light on the traditional nearest neighbors algorithm from the perspective of information theory.
We propose a robust and interpretable framework for tasks such as classification, regression, density estimation, and anomaly detection using a single model.
Our work showcases the architecture's versatility by achieving state-of-the-art results in classification and anomaly detection.
arXiv Detail & Related papers (2023-11-17T00:35:38Z) - Homological Convolutional Neural Networks [4.615338063719135]
We propose a novel deep learning architecture that exploits the data structural organization through topologically constrained network representations.
We test our model on 18 benchmark datasets against 5 classic machine learning and 3 deep learning models.
arXiv Detail & Related papers (2023-08-26T08:48:51Z) - Generalizing Dataset Distillation via Deep Generative Prior [75.9031209877651]
We propose to distill an entire dataset's knowledge into a few synthetic images.
The idea is to synthesize a small number of synthetic data points that, when given to a learning algorithm as training data, result in a model approximating one trained on the original data.
We present a new optimization algorithm that distills a large number of images into a few intermediate feature vectors in the generative model's latent space.
arXiv Detail & Related papers (2023-05-02T17:59:31Z) - Deep networks for system identification: a Survey [56.34005280792013]
System identification learns mathematical descriptions of dynamic systems from input-output data.
Main aim of the identified model is to predict new data from previous observations.
We discuss architectures commonly adopted in the literature, like feedforward, convolutional, and recurrent networks.
arXiv Detail & Related papers (2023-01-30T12:38:31Z) - DC-BENCH: Dataset Condensation Benchmark [79.18718490863908]
This work provides the first large-scale standardized benchmark on dataset condensation.
It consists of a suite of evaluations to comprehensively reflect the generability and effectiveness of condensation methods.
The benchmark library is open-sourced to facilitate future research and application.
arXiv Detail & Related papers (2022-07-20T03:54:05Z) - Weakly Supervised Change Detection Using Guided Anisotropic Difusion [97.43170678509478]
We propose original ideas that help us to leverage such datasets in the context of change detection.
First, we propose the guided anisotropic diffusion (GAD) algorithm, which improves semantic segmentation results.
We then show its potential in two weakly-supervised learning strategies tailored for change detection.
arXiv Detail & Related papers (2021-12-31T10:03:47Z) - Representation Learning for Sequence Data with Deep Autoencoding
Predictive Components [96.42805872177067]
We propose a self-supervised representation learning method for sequence data, based on the intuition that useful representations of sequence data should exhibit a simple structure in the latent space.
We encourage this latent structure by maximizing an estimate of predictive information of latent feature sequences, which is the mutual information between past and future windows at each time step.
We demonstrate that our method recovers the latent space of noisy dynamical systems, extracts predictive features for forecasting tasks, and improves automatic speech recognition when used to pretrain the encoder on large amounts of unlabeled data.
arXiv Detail & Related papers (2020-10-07T03:34:01Z) - GRAFFL: Gradient-free Federated Learning of a Bayesian Generative Model [8.87104231451079]
This paper presents the first gradient-free federated learning framework called GRAFFL.
It uses implicit information derived from each participating institution to learn posterior distributions of parameters.
We propose the GRAFFL-based Bayesian mixture model to serve as a proof-of-concept of the framework.
arXiv Detail & Related papers (2020-08-29T07:19:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.