AdaSelection: Accelerating Deep Learning Training through Data
Subsampling
- URL: http://arxiv.org/abs/2306.10728v1
- Date: Mon, 19 Jun 2023 07:01:28 GMT
- Title: AdaSelection: Accelerating Deep Learning Training through Data
Subsampling
- Authors: Minghe Zhang, Chaosheng Dong, Jinmiao Fu, Tianchen Zhou, Jia Liang,
Jia Liu, Bo Liu, Michinari Momma, Bryan Wang, Yan Gao, Yi Sun
- Abstract summary: We introduce AdaSelection, an adaptive sub-sampling method to identify the most informative sub-samples within each minibatch.
Compared with industry-standard baselines, AdaSelection consistently displays superior performance.
- Score: 27.46630703428186
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In this paper, we introduce AdaSelection, an adaptive sub-sampling method to
identify the most informative sub-samples within each minibatch to speed up the
training of large-scale deep learning models without sacrificing model
performance. Our method is able to flexibly combines an arbitrary number of
baseline sub-sampling methods incorporating the method-level importance and
intra-method sample-level importance at each iteration. The standard practice
of ad-hoc sampling often leads to continuous training with vast amounts of data
from production environments. To improve the selection of data instances during
forward and backward passes, we propose recording a constant amount of
information per instance from these passes. We demonstrate the effectiveness of
our method by testing it across various types of inputs and tasks, including
the classification tasks on both image and language datasets, as well as
regression tasks. Compared with industry-standard baselines, AdaSelection
consistently displays superior performance.
Related papers
- Dataset Quantization with Active Learning based Adaptive Sampling [11.157462442942775]
We show that maintaining performance is feasible even with uneven sample distributions.
We propose a novel active learning based adaptive sampling strategy to optimize the sample selection.
Our approach outperforms the state-of-the-art dataset compression methods.
arXiv Detail & Related papers (2024-07-09T23:09:18Z) - Diversified Batch Selection for Training Acceleration [68.67164304377732]
A prevalent research line, known as online batch selection, explores selecting informative subsets during the training process.
vanilla reference-model-free methods involve independently scoring and selecting data in a sample-wise manner.
We propose Diversified Batch Selection (DivBS), which is reference-model-free and can efficiently select diverse and representative samples.
arXiv Detail & Related papers (2024-06-07T12:12:20Z) - Towards Free Data Selection with General-Purpose Models [71.92151210413374]
A desirable data selection algorithm can efficiently choose the most informative samples to maximize the utility of limited annotation budgets.
Current approaches, represented by active learning methods, typically follow a cumbersome pipeline that iterates the time-consuming model training and batch data selection repeatedly.
FreeSel bypasses the heavy batch selection process, achieving a significant improvement in efficiency and being 530x faster than existing active learning methods.
arXiv Detail & Related papers (2023-09-29T15:50:14Z) - Convolutional autoencoder-based multimodal one-class classification [80.52334952912808]
One-class classification refers to approaches of learning using data from a single class only.
We propose a deep learning one-class classification method suitable for multimodal data.
arXiv Detail & Related papers (2023-09-25T12:31:18Z) - Towards General and Efficient Active Learning [20.888364610175987]
Active learning aims to select the most informative samples to exploit limited annotation budgets.
We propose a novel general and efficient active learning (GEAL) method in this paper.
Our method can conduct data selection processes on different datasets with a single-pass inference of the same model.
arXiv Detail & Related papers (2021-12-15T08:35:28Z) - AutoSampling: Search for Effective Data Sampling Schedules [118.20014773014671]
We propose an AutoSampling method to automatically learn sampling schedules for model training.
We apply our method to a variety of image classification tasks illustrating the effectiveness of the proposed method.
arXiv Detail & Related papers (2021-05-28T09:39:41Z) - One Backward from Ten Forward, Subsampling for Large-Scale Deep Learning [35.0157090322113]
Large-scale machine learning systems are often continuously trained with enormous data from production environments.
The sheer volume of streaming data poses a significant challenge to real-time training subsystems and ad-hoc sampling is the standard practice.
We propose to record a constant amount of information per instance from these forward passes. The extra information measurably improves the selection of which data instances should participate in forward and backward passes.
arXiv Detail & Related papers (2021-04-27T11:29:02Z) - Adaptive Task Sampling for Meta-Learning [79.61146834134459]
Key idea of meta-learning for few-shot classification is to mimic the few-shot situations faced at test time.
We propose an adaptive task sampling method to improve the generalization performance.
arXiv Detail & Related papers (2020-07-17T03:15:53Z) - Ensemble Wrapper Subsampling for Deep Modulation Classification [70.91089216571035]
Subsampling of received wireless signals is important for relaxing hardware requirements as well as the computational cost of signal processing algorithms.
We propose a subsampling technique to facilitate the use of deep learning for automatic modulation classification in wireless communication systems.
arXiv Detail & Related papers (2020-05-10T06:11:13Z) - Efficient Deep Representation Learning by Adaptive Latent Space Sampling [16.320898678521843]
Supervised deep learning requires a large amount of training samples with annotations, which are expensive and time-consuming to obtain.
We propose a novel training framework which adaptively selects informative samples that are fed to the training process.
arXiv Detail & Related papers (2020-03-19T22:17:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.