On Coresets for Support Vector Machines
- URL: http://arxiv.org/abs/2002.06469v1
- Date: Sat, 15 Feb 2020 23:25:12 GMT
- Title: On Coresets for Support Vector Machines
- Authors: Murad Tukan, Cenk Baykal, Dan Feldman, Daniela Rus
- Abstract summary: A coreset is a small, representative subset of the original data points.
We show that our algorithm can be used to extend the applicability of any off-the-shelf SVM solver to streaming, distributed, and dynamic data settings.
- Score: 61.928187390362176
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present an efficient coreset construction algorithm for large-scale
Support Vector Machine (SVM) training in Big Data and streaming applications. A
coreset is a small, representative subset of the original data points such that
a models trained on the coreset are provably competitive with those trained on
the original data set. Since the size of the coreset is generally much smaller
than the original set, our preprocess-then-train scheme has potential to lead
to significant speedups when training SVM models. We prove lower and upper
bounds on the size of the coreset required to obtain small data summaries for
the SVM problem. As a corollary, we show that our algorithm can be used to
extend the applicability of any off-the-shelf SVM solver to streaming,
distributed, and dynamic data settings. We evaluate the performance of our
algorithm on real-world and synthetic data sets. Our experimental results
reaffirm the favorable theoretical properties of our algorithm and demonstrate
its practical effectiveness in accelerating SVM training.
Related papers
- Refined Coreset Selection: Towards Minimal Coreset Size under Model
Performance Constraints [69.27190330994635]
Coreset selection is powerful in reducing computational costs and accelerating data processing for deep learning algorithms.
We propose an innovative method, which maintains optimization priority order over the model performance and coreset size.
Empirically, extensive experiments confirm its superiority, often yielding better model performance with smaller coreset sizes.
arXiv Detail & Related papers (2023-11-15T03:43:04Z) - Composable Core-sets for Diversity Approximation on Multi-Dataset
Streams [4.765131728094872]
Composable core-sets are core-sets with the property that subsets of the core set can be unioned together to obtain an approximation for the original data.
We introduce a core-set construction algorithm for constructing composable core-sets to summarize streamed data for use in active learning environments.
arXiv Detail & Related papers (2023-08-10T23:24:51Z) - Distributive Pre-Training of Generative Modeling Using Matrix-Product
States [0.0]
We consider an alternative training scheme utilizing basic tensor network operations, e.g., summation and compression.
The training algorithm is based on compressing the superposition state constructed from all the training data in product state representation.
We benchmark the algorithm on the MNIST dataset and show reasonable results for generating new images and classification tasks.
arXiv Detail & Related papers (2023-06-26T15:46:08Z) - Efficient Dataset Distillation Using Random Feature Approximation [109.07737733329019]
We propose a novel algorithm that uses a random feature approximation (RFA) of the Neural Network Gaussian Process (NNGP) kernel.
Our algorithm provides at least a 100-fold speedup over KIP and can run on a single GPU.
Our new method, termed an RFA Distillation (RFAD), performs competitively with KIP and other dataset condensation algorithms in accuracy over a range of large-scale datasets.
arXiv Detail & Related papers (2022-10-21T15:56:13Z) - Coreset of Hyperspectral Images on Small Quantum Computer [3.8637821835441732]
We use a coreset ("core of a dataset") of given EO data for training an SVM on this small D-Wave QA.
We measured the closeness between an original dataset and its coreset by employing a Kullback-Leibler (KL) divergence measure.
arXiv Detail & Related papers (2022-04-10T14:14:20Z) - DANCE: DAta-Network Co-optimization for Efficient Segmentation Model
Training and Inference [85.02494022662505]
DANCE is an automated simultaneous data-network co-optimization for efficient segmentation model training and inference.
It integrates automated data slimming which adaptively downsamples/drops input images and controls their corresponding contribution to the training loss guided by the images' spatial complexity.
Experiments and ablating studies demonstrate that DANCE can achieve "all-win" towards efficient segmentation.
arXiv Detail & Related papers (2021-07-16T04:58:58Z) - Dataset Meta-Learning from Kernel Ridge-Regression [18.253682891579402]
Kernel Inducing Points (KIP) can compress datasets by one or two orders of magnitude.
KIP-learned datasets are transferable to the training of finite-width neural networks even beyond the lazy-training regime.
arXiv Detail & Related papers (2020-10-30T18:54:04Z) - Coresets via Bilevel Optimization for Continual Learning and Streaming [86.67190358712064]
We propose a novel coreset construction via cardinality-constrained bilevel optimization.
We show how our framework can efficiently generate coresets for deep neural networks, and demonstrate its empirical benefits in continual learning and in streaming settings.
arXiv Detail & Related papers (2020-06-06T14:20:25Z) - Convolutional Support Vector Machine [1.5990720051907859]
This paper proposes a novel convolutional SVM (CSVM) that has both advantages of CNN and SVM to improve the accuracy and effectiveness of mining smaller datasets.
To evaluate the performance of the proposed CSVM, experiments were conducted to test five well-known benchmark databases for the classification problem.
arXiv Detail & Related papers (2020-02-11T11:23:21Z) - Large-Scale Gradient-Free Deep Learning with Recursive Local
Representation Alignment [84.57874289554839]
Training deep neural networks on large-scale datasets requires significant hardware resources.
Backpropagation, the workhorse for training these networks, is an inherently sequential process that is difficult to parallelize.
We propose a neuro-biologically-plausible alternative to backprop that can be used to train deep networks.
arXiv Detail & Related papers (2020-02-10T16:20:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.