Related papers: Refined Coreset Selection: Towards Minimal Coreset Size under Model Performance Constraints

Refined Coreset Selection: Towards Minimal Coreset Size under Model Performance Constraints

URL: http://arxiv.org/abs/2311.08675v2
Date: Thu, 29 Feb 2024 14:31:40 GMT
Title: Refined Coreset Selection: Towards Minimal Coreset Size under Model Performance Constraints
Authors: Xiaobo Xia, Jiale Liu, Shaokun Zhang, Qingyun Wu, Hongxin Wei, Tongliang Liu
Abstract summary: Coreset selection is powerful in reducing computational costs and accelerating data processing for deep learning algorithms. We propose an innovative method, which maintains optimization priority order over the model performance and coreset size. Empirically, extensive experiments confirm its superiority, often yielding better model performance with smaller coreset sizes.
Score: 69.27190330994635
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Coreset selection is powerful in reducing computational costs and accelerating data processing for deep learning algorithms. It strives to identify a small subset from large-scale data, so that training only on the subset practically performs on par with full data. Practitioners regularly desire to identify the smallest possible coreset in realistic scenes while maintaining comparable model performance, to minimize costs and maximize acceleration. Motivated by this desideratum, for the first time, we pose the problem of refined coreset selection, in which the minimal coreset size under model performance constraints is explored. Moreover, to address this problem, we propose an innovative method, which maintains optimization priority order over the model performance and coreset size, and efficiently optimizes them in the coreset selection procedure. Theoretically, we provide the convergence guarantee of the proposed method. Empirically, extensive experiments confirm its superiority compared with previous strategies, often yielding better model performance with smaller coreset sizes.

Related papers

Non-Uniform Class-Wise Coreset Selection: Characterizing Category Difficulty for Data-Efficient Transfer Learning [19.152700266277247]
Non-Uniform Class-Wise Coreset Selection (NUCS) is a novel framework that integrates both class-level and instance-level criteria. Our work highlights the importance of characterizing category difficulty in coreset selection, offering a robust and data-efficient solution for transfer learning.
arXiv Detail & Related papers (2025-04-17T15:40:51Z)
Deep Minimax Classifiers for Imbalanced Datasets with a Small Number of Minority Samples [5.217870815854702]
We propose a novel minimax learning algorithm designed to minimize the risk of worst-performing classes. Our proposed algorithm has a provable convergence property, and empirical results indicate that our algorithm performs better than or is comparable to existing methods.
arXiv Detail & Related papers (2025-02-24T08:20:02Z)
Lossless Model Compression via Joint Low-Rank Factorization Optimization [3.318320512635214]
Low-rank factorization is a popular model compression technique that minimizes the error $delta$ between approximated and original weight matrices. Despite achieving performances close to the original models when $delta$ is optimized, a performance discrepancy remains due to the separate optimization processes for low-rank factorization and model performance. We introduce a novel joint optimization strategy for lossless low-rank weight factorization, which, for the first time, enhances the model's performance beyond the original.
arXiv Detail & Related papers (2024-12-09T09:37:54Z)
Majority Kernels: An Approach to Leverage Big Model Dynamics for Efficient Small Model Training [32.154166415680066]
Methods like distillation, compression or quantization help leverage the highly performant large models to induce smaller performant ones. This paper explores the hypothesis that a single training run can simultaneously train a larger model for performance and derive a smaller model for deployment.
arXiv Detail & Related papers (2024-02-07T17:07:41Z)
Improved Distribution Matching for Dataset Condensation [91.55972945798531]
We propose a novel dataset condensation method based on distribution matching. Our simple yet effective method outperforms most previous optimization-oriented methods with much fewer computational resources.
arXiv Detail & Related papers (2023-07-19T04:07:33Z)
Probabilistic Bilevel Coreset Selection [24.874967723659022]
We propose a continuous probabilistic bilevel formulation of coreset selection by learning a probablistic weight for each training sample. We develop an efficient solver to the bilevel optimization problem via unbiased policy gradient without trouble of implicit differentiation.
arXiv Detail & Related papers (2023-01-24T09:37:00Z)
Coverage-centric Coreset Selection for High Pruning Rates [11.18635356469467]
One-shot coreset selection aims to select a subset of the training data, given a pruning rate, that can achieve high accuracy for models that are subsequently trained only with that subset. State-of-the-art coreset selection methods typically assign an importance score to each example and select the most important examples to form a coreset. But at high pruning rates, they have been found to suffer a catastrophic accuracy drop, performing worse than even random coreset selection.
arXiv Detail & Related papers (2022-10-28T00:14:00Z)
A Novel Sequential Coreset Method for Gradient Descent Algorithms [21.40879052693993]
Coreset is a popular data compression technique that has been extensively studied before. We propose a new framework, termed ''sequential coreset'', which effectively avoids the pseudo-dimension and total sensitivity bound. Our method is particularly suitable for sparse optimization whence the coreset size can be further reduced to be only poly-logarithmically dependent on the dimension.
arXiv Detail & Related papers (2021-12-05T08:12:16Z)
Data Summarization via Bilevel Optimization [48.89977988203108]
A simple yet powerful approach is to operate on small subsets of data. In this work, we propose a generic coreset framework that formulates the coreset selection as a cardinality-constrained bilevel optimization problem.
arXiv Detail & Related papers (2021-09-26T09:08:38Z)
Conservative Objective Models for Effective Offline Model-Based Optimization [78.19085445065845]
Computational design problems arise in a number of settings, from synthetic biology to computer architectures. We propose a method that learns a model of the objective function that lower bounds the actual value of the ground-truth objective on out-of-distribution inputs. COMs are simple to implement and outperform a number of existing methods on a wide range of MBO problems.
arXiv Detail & Related papers (2021-07-14T17:55:28Z)
Coresets via Bilevel Optimization for Continual Learning and Streaming [86.67190358712064]
We propose a novel coreset construction via cardinality-constrained bilevel optimization. We show how our framework can efficiently generate coresets for deep neural networks, and demonstrate its empirical benefits in continual learning and in streaming settings.
arXiv Detail & Related papers (2020-06-06T14:20:25Z)
On Coresets for Support Vector Machines [61.928187390362176]
A coreset is a small, representative subset of the original data points. We show that our algorithm can be used to extend the applicability of any off-the-shelf SVM solver to streaming, distributed, and dynamic data settings.
arXiv Detail & Related papers (2020-02-15T23:25:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.