A Novel DNN Training Framework via Data Sampling and Multi-Task
Optimization
- URL: http://arxiv.org/abs/2007.01016v1
- Date: Thu, 2 Jul 2020 10:58:57 GMT
- Title: A Novel DNN Training Framework via Data Sampling and Multi-Task
Optimization
- Authors: Boyu Zhang, A. K. Qin, Hong Pan, Timos Sellis
- Abstract summary: We propose a novel framework to train DNN models.
It generates multiple pairs of training and validation sets from the gross training set via random splitting.
It outputs the best, among all trained models, which has the overall best performance across the validation sets from all pairs.
- Score: 7.001799696806368
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Conventional DNN training paradigms typically rely on one training set and
one validation set, obtained by partitioning an annotated dataset used for
training, namely gross training set, in a certain way. The training set is used
for training the model while the validation set is used to estimate the
generalization performance of the trained model as the training proceeds to
avoid over-fitting. There exist two major issues in this paradigm. Firstly, the
validation set may hardly guarantee an unbiased estimate of generalization
performance due to potential mismatching with test data. Secondly, training a
DNN corresponds to solve a complex optimization problem, which is prone to
getting trapped into inferior local optima and thus leads to undesired training
results. To address these issues, we propose a novel DNN training framework. It
generates multiple pairs of training and validation sets from the gross
training set via random splitting, trains a DNN model of a pre-specified
structure on each pair while making the useful knowledge (e.g., promising
network parameters) obtained from one model training process to be transferred
to other model training processes via multi-task optimization, and outputs the
best, among all trained models, which has the overall best performance across
the validation sets from all pairs. The knowledge transfer mechanism featured
in this new framework can not only enhance training effectiveness by helping
the model training process to escape from local optima but also improve on
generalization performance via implicit regularization imposed on one model
training process from other model training processes. We implement the proposed
framework, parallelize the implementation on a GPU cluster, and apply it to
train several widely used DNN models. Experimental results demonstrate the
superiority of the proposed framework over the conventional training paradigm.
Related papers
- Transferable Post-training via Inverse Value Learning [83.75002867411263]
We propose modeling changes at the logits level during post-training using a separate neural network (i.e., the value network)
After training this network on a small base model using demonstrations, this network can be seamlessly integrated with other pre-trained models during inference.
We demonstrate that the resulting value network has broad transferability across pre-trained models of different parameter sizes.
arXiv Detail & Related papers (2024-10-28T13:48:43Z) - Task-Oriented Pre-Training for Drivable Area Detection [5.57325257338134]
We propose a task-oriented pre-training method that begins with generating redundant segmentation proposals.
We then introduce a Specific Category Enhancement Fine-tuning (SCEF) strategy for fine-tuning the Contrastive Language-Image Pre-training (CLIP) model.
This approach can generate a lot of coarse training data for pre-training models, which are further fine-tuned using manually annotated data.
arXiv Detail & Related papers (2024-09-30T10:25:47Z) - Unsupervised Pre-training with Language-Vision Prompts for Low-Data Instance Segmentation [105.23631749213729]
We propose a novel method for unsupervised pre-training in low-data regimes.
Inspired by the recently successful prompting technique, we introduce a new method, Unsupervised Pre-training with Language-Vision Prompts.
We show that our method can converge faster and perform better than CNN-based models in low-data regimes.
arXiv Detail & Related papers (2024-05-22T06:48:43Z) - Architecture, Dataset and Model-Scale Agnostic Data-free Meta-Learning [119.70303730341938]
We propose ePisode cUrriculum inveRsion (ECI) during data-free meta training and invErsion calibRation following inner loop (ICFIL) during meta testing.
ECI adaptively increases the difficulty level of pseudo episodes according to the real-time feedback of the meta model.
We formulate the optimization process of meta training with ECI as an adversarial form in an end-to-end manner.
arXiv Detail & Related papers (2023-03-20T15:10:41Z) - TWINS: A Fine-Tuning Framework for Improved Transferability of
Adversarial Robustness and Generalization [89.54947228958494]
This paper focuses on the fine-tuning of an adversarially pre-trained model in various classification tasks.
We propose a novel statistics-based approach, Two-WIng NormliSation (TWINS) fine-tuning framework.
TWINS is shown to be effective on a wide range of image classification datasets in terms of both generalization and robustness.
arXiv Detail & Related papers (2023-03-20T14:12:55Z) - Effective and Efficient Training for Sequential Recommendation using
Recency Sampling [91.02268704681124]
We propose a novel Recency-based Sampling of Sequences training objective.
We show that the models enhanced with our method can achieve performances exceeding or very close to stateof-the-art BERT4Rec.
arXiv Detail & Related papers (2022-07-06T13:06:31Z) - Once-for-All Adversarial Training: In-Situ Tradeoff between Robustness
and Accuracy for Free [115.81899803240758]
Adversarial training and its many variants substantially improve deep network robustness, yet at the cost of compromising standard accuracy.
This paper asks how to quickly calibrate a trained model in-situ, to examine the achievable trade-offs between its standard and robust accuracies.
Our proposed framework, Once-for-all Adversarial Training (OAT), is built on an innovative model-conditional training framework.
arXiv Detail & Related papers (2020-10-22T16:06:34Z) - Deep Ensembles for Low-Data Transfer Learning [21.578470914935938]
We study different ways of creating ensembles from pre-trained models.
We show that the nature of pre-training itself is a performant source of diversity.
We propose a practical algorithm that efficiently identifies a subset of pre-trained models for any downstream dataset.
arXiv Detail & Related papers (2020-10-14T07:59:00Z) - A Practical Incremental Method to Train Deep CTR Models [37.54660958085938]
We introduce a practical incremental method to train deep CTR models, which consists of three decoupled modules.
Our method can achieve comparable performance to the conventional batch mode training with much better training efficiency.
arXiv Detail & Related papers (2020-09-04T12:35:42Z) - A Novel Training Protocol for Performance Predictors of Evolutionary
Neural Architecture Search Algorithms [10.658358586764171]
Evolutionary Neural Architecture Search (ENAS) can automatically design the architectures of Deep Neural Networks (DNNs) using evolutionary computation algorithms.
Performance predictors are a type of regression models which can assist to accomplish the search, while without exerting much computational resource.
We propose a new training protocol to address these issues, consisting of designing a pairwise ranking indicator to construct the training target, proposing to use the logistic regression to fit the training samples, and developing a differential method to building the training instances.
arXiv Detail & Related papers (2020-08-30T14:39:28Z) - PrIU: A Provenance-Based Approach for Incrementally Updating Regression
Models [9.496524884855559]
This paper presents an efficient provenance-based approach, PrIU, for incrementally updating model parameters without sacrificing prediction accuracy.
We prove the correctness and convergence of the incrementally updated model parameters, and validate it experimentally.
Experimental results show that up to two orders of magnitude speed-ups can be achieved by PrIU-opt compared to simply retraining the model from scratch, yet obtaining highly similar models.
arXiv Detail & Related papers (2020-02-26T21:04:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.