AutoCoreset: An Automatic Practical Coreset Construction Framework
- URL: http://arxiv.org/abs/2305.11980v1
- Date: Fri, 19 May 2023 19:59:52 GMT
- Title: AutoCoreset: An Automatic Practical Coreset Construction Framework
- Authors: Alaa Maalouf and Murad Tukan and Vladimir Braverman and Daniela Rus
- Abstract summary: A coreset is a tiny weighted subset of an input set, that closely resembles the loss function.
We propose an automatic framework for constructing coresets, which requires only the input data and the desired cost function from the user.
We show that while this set is limited, the coreset is quite general.
- Score: 65.37876706107764
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A coreset is a tiny weighted subset of an input set, that closely resembles
the loss function, with respect to a certain set of queries. Coresets became
prevalent in machine learning as they have shown to be advantageous for many
applications. While coreset research is an active research area, unfortunately,
coresets are constructed in a problem-dependent manner, where for each problem,
a new coreset construction algorithm is usually suggested, a process that may
take time or may be hard for new researchers in the field. Even the generic
frameworks require additional (problem-dependent) computations or proofs to be
done by the user. Besides, many problems do not have (provable) small coresets,
limiting their applicability. To this end, we suggest an automatic practical
framework for constructing coresets, which requires (only) the input data and
the desired cost function from the user, without the need for any other
task-related computation to be done by the user. To do so, we reduce the
problem of approximating a loss function to an instance of vector summation
approximation, where the vectors we aim to sum are loss vectors of a specific
subset of the queries, such that we aim to approximate the image of the
function on this subset. We show that while this set is limited, the coreset is
quite general. An extensive experimental study on various machine learning
applications is also conducted. Finally, we provide a ``plug and play" style
implementation, proposing a user-friendly system that can be easily used to
apply coresets for many problems. Full open source code can be found at
\href{https://github.com/alaamaalouf/AutoCoreset}{\text{https://github.com/alaamaalouf/AutoCoreset}}.
We believe that these contributions enable future research and easier use and
applications of coresets.
Related papers
- Benchmarking Predictive Coding Networks -- Made Simple [48.652114040426625]
We first propose a library called PCX, whose focus lies on performance and simplicity.
We use PCX to implement a large set of benchmarks for the community to use for their experiments.
arXiv Detail & Related papers (2024-07-01T10:33:44Z) - Provable Data Subset Selection For Efficient Neural Network Training [73.34254513162898]
We introduce the first algorithm to construct coresets for emphRBFNNs, i.e., small weighted subsets that approximate the loss of the input data on any radial basis function network.
We then perform empirical evaluations on function approximation and dataset subset selection on popular network architectures and data sets.
arXiv Detail & Related papers (2023-03-09T10:08:34Z) - Coresets for Relational Data and The Applications [8.573878018370547]
A coreset is a small set that can preserve the structure of the original input data set.
We show that our coreset approach can be applied for the machine learning tasks, such as clustering, logistic regression and SVM.
arXiv Detail & Related papers (2022-10-09T12:46:27Z) - An Empirical Evaluation of $k$-Means Coresets [4.45709593827781]
There is no work on comparing the quality of available $k$-means coresets.
We propose a benchmark for which we argue that computing coresets is challenging.
We conduct an exhaustive evaluation of the most commonly used coreset algorithms from theory and practice.
arXiv Detail & Related papers (2022-07-03T06:47:53Z) - A Unified Approach to Coreset Learning [24.79658173754555]
Coreset of a given dataset and loss function is usually a small weighed set that approximates this loss for every query from a given set of queries.
We propose a generic, learning-based algorithm for construction of coresets.
arXiv Detail & Related papers (2021-11-04T17:48:05Z) - An Information Theory-inspired Strategy for Automatic Network Pruning [88.51235160841377]
Deep convolution neural networks are well known to be compressed on devices with resource constraints.
Most existing network pruning methods require laborious human efforts and prohibitive computation resources.
We propose an information theory-inspired strategy for automatic model compression.
arXiv Detail & Related papers (2021-08-19T07:03:22Z) - Introduction to Core-sets: an Updated Survey [18.059360820527687]
In machine learning problems, the goal is to minimize or maximize an objective function over some space of candidate solutions.
Traditional algorithms cannot handle modern systems that require parallel real-time computations of infinite distributed streams.
This survey summarizes such constructions in a retrospective way, that aims to unified and simplify the state-of-the-art.
arXiv Detail & Related papers (2020-11-18T16:31:34Z) - An Integer Linear Programming Framework for Mining Constraints from Data [81.60135973848125]
We present a general framework for mining constraints from data.
In particular, we consider the inference in structured output prediction as an integer linear programming (ILP) problem.
We show that our approach can learn to solve 9x9 Sudoku puzzles and minimal spanning tree problems from examples without providing the underlying rules.
arXiv Detail & Related papers (2020-06-18T20:09:53Z) - Coresets for Near-Convex Functions [25.922075279588757]
Coreset is usually a small weighted subset of $n$ input points in $mathbbRd$, that provably approximates their loss function for a given set of queries.
We suggest a generic framework for computing sensitivities (and thus coresets) for wide family of loss functions.
Examples include SVM, Logistic M-estimators, and $ell_z$-regression.
arXiv Detail & Related papers (2020-06-09T19:49:19Z) - On Coresets for Support Vector Machines [61.928187390362176]
A coreset is a small, representative subset of the original data points.
We show that our algorithm can be used to extend the applicability of any off-the-shelf SVM solver to streaming, distributed, and dynamic data settings.
arXiv Detail & Related papers (2020-02-15T23:25:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.