Data-Efficient Training by Evolved Sampling
- URL: http://arxiv.org/abs/2509.23461v1
- Date: Sat, 27 Sep 2025 19:19:16 GMT
- Title: Data-Efficient Training by Evolved Sampling
- Authors: Ziheng Cheng, Zhong Li, Jiang Bian,
- Abstract summary: We propose textbfEvolved Sampling (textbfES), a framework for emphdynamic sampling along the training process.<n> ES(WP) consistently achieves lossless training accelerations across various pre-training and post-training tasks, saving up to nearly 45% wall-clock time.
- Score: 23.886561235819773
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Data selection is designed to accelerate learning with preserved performance. To achieve this, a fundamental thought is to identify informative data samples with significant contributions to the training. In this work, we propose \textbf{Evolved Sampling} (\textbf{ES}), a simple yet effective framework for \emph{dynamic} sampling along the training process. This method conducts \em batch \em level data selection based on the dynamics of losses and augmented \emph{loss differences}, which enables flexible \emph{frequency tuning}, and hence significantly reduces the back propagation time with maintained model performance. Due to its conciseness, ES is also readily extensible to incorporate \em set \em level data selection (to form ES with pruning, \textbf{ESWP}) for further accelerations. As a plug-and-play framework, ES(WP) consistently achieves lossless training accelerations across various pre-training and post-training tasks, saving up to nearly 45\% wall-clock time. Our results motivate further investigations on the data efficiency aspect of modern large-scale machine learning.
Related papers
- Learning from Complexity: Exploring Dynamic Sample Pruning of Spatio-Temporal Training [36.98769959300113]
Training deep learning models on massive, often redundant datasets presents a significant computational bottleneck.<n>In this paper, we explore a novel training techniques, namely learning from complexity with dynamic sample pruning.<n>We show that ST-Prune significantly accelerates the training speed while maintaining or even improving the model performance.
arXiv Detail & Related papers (2026-02-22T10:11:04Z) - \emph{FoQuS}: A Forgetting-Quality Coreset Selection Framework for Automatic Modulation Recognition [17.237106100331225]
We propose emphFoQuS, which approximates the effect of full training by selecting a coreset from the original dataset.<n>Experiments show that emphFoQuS can maintain high recognition accuracy and good cross-architecture generalization on multiple AMR datasets.
arXiv Detail & Related papers (2025-09-10T05:39:49Z) - Take the Bull by the Horns: Hard Sample-Reweighted Continual Training
Improves LLM Generalization [165.98557106089777]
A key challenge is to enhance the capabilities of large language models (LLMs) amid a looming shortage of high-quality training data.
Our study starts from an empirical strategy for the light continual training of LLMs using their original pre-training data sets.
We then formalize this strategy into a principled framework of Instance-Reweighted Distributionally Robust Optimization.
arXiv Detail & Related papers (2024-02-22T04:10:57Z) - Exploring Learning Complexity for Efficient Downstream Dataset Pruning [8.990878450631596]
Existing dataset pruning methods require training on the entire dataset.<n>We propose a straightforward, novel, and training-free hardness score named Distorting-based Learning Complexity (DLC)<n>Our method is motivated by the observation that easy samples learned faster can also be learned with fewer parameters.
arXiv Detail & Related papers (2024-02-08T02:29:33Z) - Noisy Self-Training with Synthetic Queries for Dense Retrieval [49.49928764695172]
We introduce a novel noisy self-training framework combined with synthetic queries.
Experimental results show that our method improves consistently over existing methods.
Our method is data efficient and outperforms competitive baselines.
arXiv Detail & Related papers (2023-11-27T06:19:50Z) - SwiftLearn: A Data-Efficient Training Method of Deep Learning Models
using Importance Sampling [3.8330834108666667]
We present SwiftLearn, a data-efficient approach to accelerate training of deep learning models.
This subset is selected based on an importance criteria measured over the entire dataset during warm-up stages.
We show that almost 90% of the data can be dropped achieving an end-to-end average speedup of 3.36x while keeping the average accuracy drop less than 0.92%.
arXiv Detail & Related papers (2023-11-25T22:51:01Z) - Reconstructing Training Data from Model Gradient, Provably [68.21082086264555]
We reconstruct the training samples from a single gradient query at a randomly chosen parameter value.
As a provable attack that reveals sensitive training data, our findings suggest potential severe threats to privacy.
arXiv Detail & Related papers (2022-12-07T15:32:22Z) - Effective Vision Transformer Training: A Data-Centric Perspective [24.02488085447691]
Vision Transformers (ViTs) have shown promising performance compared with Convolutional Neural Networks (CNNs)
In this paper, we define several metrics, including Dynamic Data Proportion (DDP) and Knowledge Assimilation Rate (KAR)
We propose a novel data-centric ViT training framework to dynamically measure the difficulty'' of training samples and generate effective'' samples for models at different training stages.
arXiv Detail & Related papers (2022-09-29T17:59:46Z) - MEST: Accurate and Fast Memory-Economic Sparse Training Framework on the
Edge [72.16021611888165]
This paper proposes a novel Memory-Economic Sparse Training (MEST) framework targeting for accurate and fast execution on edge devices.
The proposed MEST framework consists of enhancements by Elastic Mutation (EM) and Soft Memory Bound (&S)
Our results suggest that unforgettable examples can be identified in-situ even during the dynamic exploration of sparsity masks.
arXiv Detail & Related papers (2021-10-26T21:15:17Z) - Predicting Training Time Without Training [120.92623395389255]
We tackle the problem of predicting the number of optimization steps that a pre-trained deep network needs to converge to a given value of the loss function.
We leverage the fact that the training dynamics of a deep network during fine-tuning are well approximated by those of a linearized model.
We are able to predict the time it takes to fine-tune a model to a given loss without having to perform any training.
arXiv Detail & Related papers (2020-08-28T04:29:54Z) - Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training.
We experimentally verify that the new dataset can significantly improve the ability of the learned FER model.
To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z) - Improving Semantic Segmentation via Self-Training [75.07114899941095]
We show that we can obtain state-of-the-art results using a semi-supervised approach, specifically a self-training paradigm.
We first train a teacher model on labeled data, and then generate pseudo labels on a large set of unlabeled data.
Our robust training framework can digest human-annotated and pseudo labels jointly and achieve top performances on Cityscapes, CamVid and KITTI datasets.
arXiv Detail & Related papers (2020-04-30T17:09:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.