How Important is Importance Sampling for Deep Budgeted Training?
- URL: http://arxiv.org/abs/2110.14283v1
- Date: Wed, 27 Oct 2021 09:03:57 GMT
- Title: How Important is Importance Sampling for Deep Budgeted Training?
- Authors: Eric Arazo, Diego Ortego, Paul Albert, Noel E. O'Connor, Kevin
McGuinness
- Abstract summary: This work explores how a budget constraint interacts with importance sampling approaches and data augmentation techniques.
We show that under budget restrictions, importance sampling approaches do not provide a consistent improvement over uniform sampling.
- Score: 17.264550056296915
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Long iterative training processes for Deep Neural Networks (DNNs) are
commonly required to achieve state-of-the-art performance in many computer
vision tasks. Importance sampling approaches might play a key role in budgeted
training regimes, i.e. when limiting the number of training iterations. These
approaches aim at dynamically estimating the importance of each sample to focus
on the most relevant and speed up convergence. This work explores this paradigm
and how a budget constraint interacts with importance sampling approaches and
data augmentation techniques. We show that under budget restrictions,
importance sampling approaches do not provide a consistent improvement over
uniform sampling. We suggest that, given a specific budget, the best course of
action is to disregard the importance and introduce adequate data augmentation;
e.g. when reducing the budget to a 30% in CIFAR-10/100, RICAP data augmentation
maintains accuracy, while importance sampling does not. We conclude from our
work that DNNs under budget restrictions benefit greatly from variety in the
training set and that finding the right samples to train on is not the most
effective strategy when balancing high performance with low computational
requirements. Source code available at https://git.io/JKHa3 .
Related papers
- Importance Sampling via Score-based Generative Models [12.32722207200796]
Importance sampling involves sampling from a probability density function proportional to the product of an importance weight function and a base PDF.
We propose an entirely training-free Importance sampling framework that relies solely on an SGM for the base PDF.
We conduct a thorough analysis demonstrating the method's scalability and effectiveness across diverse datasets and tasks.
arXiv Detail & Related papers (2025-02-07T04:09:03Z) - Enhancing Adaptive Mixed-Criticality Scheduling with Deep Reinforcement Learning [0.0]
We enhance Adaptive Mixed-Criticality (AMC) with a deep reinforcement learning (DRL) approach based on a Deep-Q Network.
The DRL agent is trained off-line, and at run-time adjusts the low-criticality budgets of tasks to avoid budget overruns, while ensuring that no job misses its deadline if it does not overrun its budget.
The results show that the agent is able to reduce budget overruns by at least up to 50%, even when the budget of each task is chosen based on sampling the distribution of its execution time.
arXiv Detail & Related papers (2024-11-01T13:29:33Z) - How Much Data are Enough? Investigating Dataset Requirements for Patch-Based Brain MRI Segmentation Tasks [74.21484375019334]
Training deep neural networks reliably requires access to large-scale datasets.
To mitigate both the time and financial costs associated with model development, a clear understanding of the amount of data required to train a satisfactory model is crucial.
This paper proposes a strategic framework for estimating the amount of annotated data required to train patch-based segmentation networks.
arXiv Detail & Related papers (2024-04-04T13:55:06Z) - Take the Bull by the Horns: Hard Sample-Reweighted Continual Training
Improves LLM Generalization [165.98557106089777]
A key challenge is to enhance the capabilities of large language models (LLMs) amid a looming shortage of high-quality training data.
Our study starts from an empirical strategy for the light continual training of LLMs using their original pre-training data sets.
We then formalize this strategy into a principled framework of Instance-Reweighted Distributionally Robust Optimization.
arXiv Detail & Related papers (2024-02-22T04:10:57Z) - Where Should I Spend My FLOPS? Efficiency Evaluations of Visual
Pre-training Methods [29.141145775835106]
Given a fixed FLOP budget, what are the best datasets, models, and (self-supervised) training methods for obtaining high accuracy on representative visual tasks?
We examine five large-scale datasets (JFT-300M, ALIGN, ImageNet-1K, ImageNet-21K, and COCO) and six pre-training methods (CLIP, DINO, SimCLR, BYOL, Masked Autoencoding, and supervised)
Our results call into question the commonly-held assumption that self-supervised methods inherently scale to large, uncurated data.
arXiv Detail & Related papers (2022-09-30T17:04:55Z) - How Much More Data Do I Need? Estimating Requirements for Downstream
Tasks [99.44608160188905]
Given a small training data set and a learning algorithm, how much more data is necessary to reach a target validation or test performance?
Overestimating or underestimating data requirements incurs substantial costs that could be avoided with an adequate budget.
Using our guidelines, practitioners can accurately estimate data requirements of machine learning systems to gain savings in both development time and data acquisition costs.
arXiv Detail & Related papers (2022-07-04T21:16:05Z) - Few-shot Quality-Diversity Optimization [50.337225556491774]
Quality-Diversity (QD) optimization has been shown to be effective tools in dealing with deceptive minima and sparse rewards in Reinforcement Learning.
We show that, given examples from a task distribution, information about the paths taken by optimization in parameter space can be leveraged to build a prior population, which when used to initialize QD methods in unseen environments, allows for few-shot adaptation.
Experiments carried in both sparse and dense reward settings using robotic manipulation and navigation benchmarks show that it considerably reduces the number of generations that are required for QD optimization in these environments.
arXiv Detail & Related papers (2021-09-14T17:12:20Z) - A Simple Fine-tuning Is All You Need: Towards Robust Deep Learning Via
Adversarial Fine-tuning [90.44219200633286]
We propose a simple yet very effective adversarial fine-tuning approach based on a $textitslow start, fast decay$ learning rate scheduling strategy.
Experimental results show that the proposed adversarial fine-tuning approach outperforms the state-of-the-art methods on CIFAR-10, CIFAR-100 and ImageNet datasets.
arXiv Detail & Related papers (2020-12-25T20:50:15Z) - Efficient Conditional Pre-training for Transfer Learning [71.01129334495553]
We propose efficient filtering methods to select relevant subsets from the pre-training dataset.
We validate our techniques by pre-training on ImageNet in both the unsupervised and supervised settings.
We improve standard ImageNet pre-training by 1-3% by tuning available models on our subsets and pre-training on a dataset filtered from a larger scale dataset.
arXiv Detail & Related papers (2020-11-20T06:16:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.