Exploring Learning Complexity for Downstream Data Pruning
- URL: http://arxiv.org/abs/2402.05356v1
- Date: Thu, 8 Feb 2024 02:29:33 GMT
- Title: Exploring Learning Complexity for Downstream Data Pruning
- Authors: Wenyu Jiang, Zhenlong Liu, Zejian Xie, Songxin Zhang, Bingyi Jing,
Hongxin Wei
- Abstract summary: We propose to treat the learning complexity (LC) as the scoring function for classification and regression tasks.
For the instruction fine-tuning of large language models, our method achieves state-of-the-art performance with stable convergence.
- Score: 9.526877053855998
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The over-parameterized pre-trained models pose a great challenge to
fine-tuning with limited computation resources. An intuitive solution is to
prune the less informative samples from the fine-tuning dataset. A series of
training-based scoring functions are proposed to quantify the informativeness
of the data subset but the pruning cost becomes non-negligible due to the heavy
parameter updating. For efficient pruning, it is viable to adapt the similarity
scoring function of geometric-based methods from training-based to
training-free. However, we empirically show that such adaption distorts the
original pruning and results in inferior performance on the downstream tasks.
In this paper, we propose to treat the learning complexity (LC) as the scoring
function for classification and regression tasks. Specifically, the learning
complexity is defined as the average predicted confidence of subnets with
different capacities, which encapsulates data processing within a converged
model. Then we preserve the diverse and easy samples for fine-tuning. Extensive
experiments with vision datasets demonstrate the effectiveness and efficiency
of the proposed scoring function for classification tasks. For the instruction
fine-tuning of large language models, our method achieves state-of-the-art
performance with stable convergence, outperforming the full training with only
10\% of the instruction dataset.
Related papers
- SPaRFT: Self-Paced Reinforcement Fine-Tuning for Large Language Models [51.74498855100541]
Large language models (LLMs) have shown strong reasoning capabilities when fine-tuned with reinforcement learning (RL)<n>We propose textbfSPaRFT, a self-paced learning framework that enables efficient learning based on the capability of the model being trained.
arXiv Detail & Related papers (2025-08-07T03:50:48Z) - Quality over Quantity: An Effective Large-Scale Data Reduction Strategy Based on Pointwise V-Information [2.133855532092057]
We propose an effective data reduction strategy based on Pointwise - Information (PVI)<n>Experiments show that the classifier performance is maintained with only a 0.0001% to 0.76% reduction in accuracy when 10%-30% of the data is removed.<n>We have adapted the PVI framework, which was previously limited to English datasets, to a variety of Chinese NLP tasks and base models.
arXiv Detail & Related papers (2025-06-19T06:59:19Z) - Scale Efficient Training for Large Datasets [27.28640920242675]
To remove low-value samples, SeTa first performs random pruning to eliminate redundant samples, then clusters the remaining samples according to their learning difficulty measured by loss.
SeTa reduces training costs by up to 50% while maintaining or improving performance, with minimal degradation even at 70% cost reduction.
arXiv Detail & Related papers (2025-03-17T17:13:43Z) - Prior-Fitted Networks Scale to Larger Datasets When Treated as Weak Learners [82.72552644267724]
BoostPFN can outperform standard PFNs with the same size of training samples in large datasets.
High performance is maintained for up to 50x of the pre-training size of PFNs.
arXiv Detail & Related papers (2025-03-03T07:31:40Z) - A CLIP-Powered Framework for Robust and Generalizable Data Selection [51.46695086779598]
Real-world datasets often contain redundant and noisy data, imposing a negative impact on training efficiency and model performance.
Data selection has shown promise in identifying the most representative samples from the entire dataset.
We propose a novel CLIP-powered data selection framework that leverages multimodal information for more robust and generalizable sample selection.
arXiv Detail & Related papers (2024-10-15T03:00:58Z) - Effective pruning of web-scale datasets based on complexity of concept
clusters [48.125618324485195]
We present a method for pruning large-scale multimodal datasets for training CLIP-style models on ImageNet.
We find that training on a smaller set of high-quality data can lead to higher performance with significantly lower training costs.
We achieve a new state-of-the-art Imagehttps://info.arxiv.org/help/prep#commentsNet zero-shot accuracy and a competitive average zero-shot accuracy on 38 evaluation tasks.
arXiv Detail & Related papers (2024-01-09T14:32:24Z) - Online Importance Sampling for Stochastic Gradient Optimization [33.42221341526944]
We propose a practical algorithm that efficiently computes data importance on-the-fly during training.
We also introduce a novel metric based on the derivative of the loss w.r.t. the network output, designed for mini-batch importance sampling.
arXiv Detail & Related papers (2023-11-24T13:21:35Z) - KAKURENBO: Adaptively Hiding Samples in Deep Neural Network Training [2.8804804517897935]
We propose a method for hiding the least-important samples during the training of deep neural networks.
We adaptively find samples to exclude in a given epoch based on their contribution to the overall learning process.
Our method can reduce total training time by up to 22% impacting accuracy only by 0.4% compared to the baseline.
arXiv Detail & Related papers (2023-10-16T06:19:29Z) - Improved Distribution Matching for Dataset Condensation [91.55972945798531]
We propose a novel dataset condensation method based on distribution matching.
Our simple yet effective method outperforms most previous optimization-oriented methods with much fewer computational resources.
arXiv Detail & Related papers (2023-07-19T04:07:33Z) - RPLKG: Robust Prompt Learning with Knowledge Graph [11.893917358053004]
We propose a new method, robust prompt learning with knowledge graph (RPLKG)
Based on the knowledge graph, we automatically design diverse interpretable and meaningful prompt sets.
RPLKG shows a significant performance improvement compared to zero-shot learning.
arXiv Detail & Related papers (2023-04-21T08:22:58Z) - On Measuring the Intrinsic Few-Shot Hardness of Datasets [49.37562545777455]
We show that few-shot hardness may be intrinsic to datasets, for a given pre-trained model.
We propose a simple and lightweight metric called "Spread" that captures the intuition that few-shot learning is made possible.
Our metric better accounts for few-shot hardness compared to existing notions of hardness, and is 8-100x faster to compute.
arXiv Detail & Related papers (2022-11-16T18:53:52Z) - Dataset Condensation with Distribution Matching [30.571335208276246]
dataset condensation aims to replace the original large training set with a significantly smaller learned synthetic set.
We propose a simple yet effective dataset condensation technique that requires significantly lower training cost.
Thanks to its efficiency, we apply our method to more realistic and larger datasets with sophisticated neural architectures.
arXiv Detail & Related papers (2021-10-08T15:02:30Z) - Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training.
We experimentally verify that the new dataset can significantly improve the ability of the learned FER model.
To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z) - Dynamic Scale Training for Object Detection [111.33112051962514]
We propose a Dynamic Scale Training paradigm (abbreviated as DST) to mitigate scale variation challenge in object detection.
Experimental results demonstrate the efficacy of our proposed DST towards scale variation handling.
It does not introduce inference overhead and could serve as a free lunch for general detection configurations.
arXiv Detail & Related papers (2020-04-26T16:48:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.