A Unified Approach to Coreset Learning
- URL: http://arxiv.org/abs/2111.03044v1
- Date: Thu, 4 Nov 2021 17:48:05 GMT
- Title: A Unified Approach to Coreset Learning
- Authors: Alaa Maalouf and Gilad Eini and Ben Mussay and Dan Feldman and
Margarita Osadchy
- Abstract summary: Coreset of a given dataset and loss function is usually a small weighed set that approximates this loss for every query from a given set of queries.
We propose a generic, learning-based algorithm for construction of coresets.
- Score: 24.79658173754555
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Coreset of a given dataset and loss function is usually a small weighed set
that approximates this loss for every query from a given set of queries.
Coresets have shown to be very useful in many applications. However, coresets
construction is done in a problem dependent manner and it could take years to
design and prove the correctness of a coreset for a specific family of queries.
This could limit coresets use in practical applications. Moreover, small
coresets provably do not exist for many problems.
To address these limitations, we propose a generic, learning-based algorithm
for construction of coresets. Our approach offers a new definition of coreset,
which is a natural relaxation of the standard definition and aims at
approximating the \emph{average} loss of the original data over the queries.
This allows us to use a learning paradigm to compute a small coreset of a given
set of inputs with respect to a given loss function using a training set of
queries. We derive formal guarantees for the proposed approach. Experimental
evaluation on deep networks and classic machine learning problems show that our
learned coresets yield comparable or even better results than the existing
algorithms with worst-case theoretical guarantees (that may be too pessimistic
in practice). Furthermore, our approach applied to deep network pruning
provides the first coreset for a full deep network, i.e., compresses all the
network at once, and not layer by layer or similar divide-and-conquer methods.
Related papers
- Refined Coreset Selection: Towards Minimal Coreset Size under Model
Performance Constraints [69.27190330994635]
Coreset selection is powerful in reducing computational costs and accelerating data processing for deep learning algorithms.
We propose an innovative method, which maintains optimization priority order over the model performance and coreset size.
Empirically, extensive experiments confirm its superiority, often yielding better model performance with smaller coreset sizes.
arXiv Detail & Related papers (2023-11-15T03:43:04Z) - AutoCoreset: An Automatic Practical Coreset Construction Framework [65.37876706107764]
A coreset is a tiny weighted subset of an input set, that closely resembles the loss function.
We propose an automatic framework for constructing coresets, which requires only the input data and the desired cost function from the user.
We show that while this set is limited, the coreset is quite general.
arXiv Detail & Related papers (2023-05-19T19:59:52Z) - Coresets for Relational Data and The Applications [8.573878018370547]
A coreset is a small set that can preserve the structure of the original input data set.
We show that our coreset approach can be applied for the machine learning tasks, such as clustering, logistic regression and SVM.
arXiv Detail & Related papers (2022-10-09T12:46:27Z) - An Empirical Evaluation of $k$-Means Coresets [4.45709593827781]
There is no work on comparing the quality of available $k$-means coresets.
We propose a benchmark for which we argue that computing coresets is challenging.
We conduct an exhaustive evaluation of the most commonly used coreset algorithms from theory and practice.
arXiv Detail & Related papers (2022-07-03T06:47:53Z) - A Novel Sequential Coreset Method for Gradient Descent Algorithms [21.40879052693993]
Coreset is a popular data compression technique that has been extensively studied before.
We propose a new framework, termed ''sequential coreset'', which effectively avoids the pseudo-dimension and total sensitivity bound.
Our method is particularly suitable for sparse optimization whence the coreset size can be further reduced to be only poly-logarithmically dependent on the dimension.
arXiv Detail & Related papers (2021-12-05T08:12:16Z) - Learning to Detect Critical Nodes in Sparse Graphs via Feature Importance Awareness [53.351863569314794]
The critical node problem (CNP) aims to find a set of critical nodes from a network whose deletion maximally degrades the pairwise connectivity of the residual network.
This work proposes a feature importance-aware graph attention network for node representation.
It combines it with dueling double deep Q-network to create an end-to-end algorithm to solve CNP for the first time.
arXiv Detail & Related papers (2021-12-03T14:23:05Z) - KNAS: Green Neural Architecture Search [49.36732007176059]
We propose a new kernel based architecture search approach KNAS.
Experiments show that KNAS achieves competitive results with orders of magnitude faster than "train-then-test" paradigms on image classification tasks.
The searched network also outperforms strong baseline RoBERTA-large on two text classification tasks.
arXiv Detail & Related papers (2021-11-26T02:11:28Z) - An Information Theory-inspired Strategy for Automatic Network Pruning [88.51235160841377]
Deep convolution neural networks are well known to be compressed on devices with resource constraints.
Most existing network pruning methods require laborious human efforts and prohibitive computation resources.
We propose an information theory-inspired strategy for automatic model compression.
arXiv Detail & Related papers (2021-08-19T07:03:22Z) - Introduction to Core-sets: an Updated Survey [18.059360820527687]
In machine learning problems, the goal is to minimize or maximize an objective function over some space of candidate solutions.
Traditional algorithms cannot handle modern systems that require parallel real-time computations of infinite distributed streams.
This survey summarizes such constructions in a retrospective way, that aims to unified and simplify the state-of-the-art.
arXiv Detail & Related papers (2020-11-18T16:31:34Z) - Discretization-Aware Architecture Search [81.35557425784026]
This paper presents discretization-aware architecture search (DAtextsuperscript2S)
The core idea is to push the super-network towards the configuration of desired topology, so that the accuracy loss brought by discretization is largely alleviated.
Experiments on standard image classification benchmarks demonstrate the superiority of our approach.
arXiv Detail & Related papers (2020-07-07T01:18:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.