Refined Coreset Selection: Towards Minimal Coreset Size under Model
Performance Constraints
- URL: http://arxiv.org/abs/2311.08675v2
- Date: Thu, 29 Feb 2024 14:31:40 GMT
- Title: Refined Coreset Selection: Towards Minimal Coreset Size under Model
Performance Constraints
- Authors: Xiaobo Xia, Jiale Liu, Shaokun Zhang, Qingyun Wu, Hongxin Wei,
Tongliang Liu
- Abstract summary: Coreset selection is powerful in reducing computational costs and accelerating data processing for deep learning algorithms.
We propose an innovative method, which maintains optimization priority order over the model performance and coreset size.
Empirically, extensive experiments confirm its superiority, often yielding better model performance with smaller coreset sizes.
- Score: 69.27190330994635
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Coreset selection is powerful in reducing computational costs and
accelerating data processing for deep learning algorithms. It strives to
identify a small subset from large-scale data, so that training only on the
subset practically performs on par with full data. Practitioners regularly
desire to identify the smallest possible coreset in realistic scenes while
maintaining comparable model performance, to minimize costs and maximize
acceleration. Motivated by this desideratum, for the first time, we pose the
problem of refined coreset selection, in which the minimal coreset size under
model performance constraints is explored. Moreover, to address this problem,
we propose an innovative method, which maintains optimization priority order
over the model performance and coreset size, and efficiently optimizes them in
the coreset selection procedure. Theoretically, we provide the convergence
guarantee of the proposed method. Empirically, extensive experiments confirm
its superiority compared with previous strategies, often yielding better model
performance with smaller coreset sizes.
Related papers
- Majority Kernels: An Approach to Leverage Big Model Dynamics for Efficient Small Model Training [32.154166415680066]
Methods like distillation, compression or quantization help leverage the highly performant large models to induce smaller performant ones.
This paper explores the hypothesis that a single training run can simultaneously train a larger model for performance and derive a smaller model for deployment.
arXiv Detail & Related papers (2024-02-07T17:07:41Z) - Improved Distribution Matching for Dataset Condensation [91.55972945798531]
We propose a novel dataset condensation method based on distribution matching.
Our simple yet effective method outperforms most previous optimization-oriented methods with much fewer computational resources.
arXiv Detail & Related papers (2023-07-19T04:07:33Z) - Probabilistic Bilevel Coreset Selection [24.874967723659022]
We propose a continuous probabilistic bilevel formulation of coreset selection by learning a probablistic weight for each training sample.
We develop an efficient solver to the bilevel optimization problem via unbiased policy gradient without trouble of implicit differentiation.
arXiv Detail & Related papers (2023-01-24T09:37:00Z) - Coverage-centric Coreset Selection for High Pruning Rates [11.18635356469467]
One-shot coreset selection aims to select a subset of the training data, given a pruning rate, that can achieve high accuracy for models that are subsequently trained only with that subset.
State-of-the-art coreset selection methods typically assign an importance score to each example and select the most important examples to form a coreset.
But at high pruning rates, they have been found to suffer a catastrophic accuracy drop, performing worse than even random coreset selection.
arXiv Detail & Related papers (2022-10-28T00:14:00Z) - A Novel Sequential Coreset Method for Gradient Descent Algorithms [21.40879052693993]
Coreset is a popular data compression technique that has been extensively studied before.
We propose a new framework, termed ''sequential coreset'', which effectively avoids the pseudo-dimension and total sensitivity bound.
Our method is particularly suitable for sparse optimization whence the coreset size can be further reduced to be only poly-logarithmically dependent on the dimension.
arXiv Detail & Related papers (2021-12-05T08:12:16Z) - Data Summarization via Bilevel Optimization [48.89977988203108]
A simple yet powerful approach is to operate on small subsets of data.
In this work, we propose a generic coreset framework that formulates the coreset selection as a cardinality-constrained bilevel optimization problem.
arXiv Detail & Related papers (2021-09-26T09:08:38Z) - Conservative Objective Models for Effective Offline Model-Based
Optimization [78.19085445065845]
Computational design problems arise in a number of settings, from synthetic biology to computer architectures.
We propose a method that learns a model of the objective function that lower bounds the actual value of the ground-truth objective on out-of-distribution inputs.
COMs are simple to implement and outperform a number of existing methods on a wide range of MBO problems.
arXiv Detail & Related papers (2021-07-14T17:55:28Z) - Coresets via Bilevel Optimization for Continual Learning and Streaming [86.67190358712064]
We propose a novel coreset construction via cardinality-constrained bilevel optimization.
We show how our framework can efficiently generate coresets for deep neural networks, and demonstrate its empirical benefits in continual learning and in streaming settings.
arXiv Detail & Related papers (2020-06-06T14:20:25Z) - On Coresets for Support Vector Machines [61.928187390362176]
A coreset is a small, representative subset of the original data points.
We show that our algorithm can be used to extend the applicability of any off-the-shelf SVM solver to streaming, distributed, and dynamic data settings.
arXiv Detail & Related papers (2020-02-15T23:25:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.