XMixup: Efficient Transfer Learning with Auxiliary Samples by
Cross-domain Mixup
- URL: http://arxiv.org/abs/2007.10252v1
- Date: Mon, 20 Jul 2020 16:42:29 GMT
- Title: XMixup: Efficient Transfer Learning with Auxiliary Samples by
Cross-domain Mixup
- Authors: Xingjian Li, Haoyi Xiong, Haozhe An, Chengzhong Xu, Dejing Dou
- Abstract summary: Cross-domain Mixup (XMixup) improves the multitask paradigm for deep transfer learning.
XMixup selects the auxiliary samples from the source dataset and augments training samples via the simple mixup strategy.
Experiment results show that XMixup improves the accuracy by 1.9% on average.
- Score: 60.07531696857743
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transferring knowledge from large source datasets is an effective way to
fine-tune the deep neural networks of the target task with a small sample size.
A great number of algorithms have been proposed to facilitate deep transfer
learning, and these techniques could be generally categorized into two groups -
Regularized Learning of the target task using models that have been pre-trained
from source datasets, and Multitask Learning with both source and target
datasets to train a shared backbone neural network. In this work, we aim to
improve the multitask paradigm for deep transfer learning via Cross-domain
Mixup (XMixup). While the existing multitask learning algorithms need to run
backpropagation over both the source and target datasets and usually consume a
higher gradient complexity, XMixup transfers the knowledge from source to
target tasks more efficiently: for every class of the target task, XMixup
selects the auxiliary samples from the source dataset and augments training
samples via the simple mixup strategy. We evaluate XMixup over six real world
transfer learning datasets. Experiment results show that XMixup improves the
accuracy by 1.9% on average. Compared with other state-of-the-art transfer
learning approaches, XMixup costs much less training time while still obtains
higher accuracy.
Related papers
- Adapt-$\infty$: Scalable Lifelong Multimodal Instruction Tuning via Dynamic Data Selection [89.42023974249122]
Adapt-$infty$ is a new multi-way and adaptive data selection approach for Lifelong Instruction Tuning.
We construct pseudo-skill clusters by grouping gradient-based sample vectors.
We select the best-performing data selector for each skill cluster from a pool of selector experts.
arXiv Detail & Related papers (2024-10-14T15:48:09Z) - TransformMix: Learning Transformation and Mixing Strategies from Data [20.79680733590554]
We propose an automated approach, TransformMix, to learn better transformation and mixing augmentation strategies from data.
We demonstrate the effectiveness of TransformMix on multiple datasets in transfer learning, classification, object detection, and knowledge distillation settings.
arXiv Detail & Related papers (2024-03-19T04:36:41Z) - H-ensemble: An Information Theoretic Approach to Reliable Few-Shot
Multi-Source-Free Transfer [4.328706834250445]
We propose a framework named H-ensemble, which learns the optimal linear combination of source models for the target task.
Compared to previous works, H-ensemble is characterized by: 1) its adaptability to a novel MSF setting for few-shot target tasks, 2) theoretical reliability, 3) a lightweight structure easy to interpret and adapt.
We show that the H-ensemble can successfully learn the optimal task ensemble, as well as outperform prior arts.
arXiv Detail & Related papers (2023-12-19T17:39:34Z) - Optimal transfer protocol by incremental layer defrosting [66.76153955485584]
Transfer learning is a powerful tool enabling model training with limited amounts of data.
The simplest transfer learning protocol is based on freezing" the feature-extractor layers of a network pre-trained on a data-rich source task.
We show that this protocol is often sub-optimal and the largest performance gain may be achieved when smaller portions of the pre-trained network are kept frozen.
arXiv Detail & Related papers (2023-03-02T17:32:11Z) - X-Learner: Learning Cross Sources and Tasks for Universal Visual
Representation [71.51719469058666]
We propose a representation learning framework called X-Learner.
X-Learner learns the universal feature of multiple vision tasks supervised by various sources.
X-Learner achieves strong performance on different tasks without extra annotations, modalities and computational costs.
arXiv Detail & Related papers (2022-03-16T17:23:26Z) - Active Multi-Task Representation Learning [50.13453053304159]
We give the first formal study on resource task sampling by leveraging the techniques from active learning.
We propose an algorithm that iteratively estimates the relevance of each source task to the target task and samples from each source task based on the estimated relevance.
arXiv Detail & Related papers (2022-02-02T08:23:24Z) - Single-dataset Experts for Multi-dataset Question Answering [6.092171111087768]
We train a network on multiple datasets to generalize and transfer better to new datasets.
Our approach is to model multi-dataset question answering with a collection of single-dataset experts.
Simple methods based on parameter-averaging lead to better zero-shot generalization and few-shot transfer performance.
arXiv Detail & Related papers (2021-09-28T17:08:22Z) - Mixup Without Hesitation [38.801366276601414]
We propose mixup Without hesitation (mWh), a concise, effective, and easy-to-use training algorithm.
mWh strikes a good balance between exploration and exploitation by gradually replacing mixup with basic data augmentation.
Our code is open-source and available at https://github.com/yuhao318318/mWh.
arXiv Detail & Related papers (2021-01-12T08:11:08Z) - Solving Mixed Integer Programs Using Neural Networks [57.683491412480635]
This paper applies learning to the two key sub-tasks of a MIP solver, generating a high-quality joint variable assignment, and bounding the gap in objective value between that assignment and an optimal one.
Our approach constructs two corresponding neural network-based components, Neural Diving and Neural Branching, to use in a base MIP solver such as SCIP.
We evaluate our approach on six diverse real-world datasets, including two Google production datasets and MIPLIB, by training separate neural networks on each.
arXiv Detail & Related papers (2020-12-23T09:33:11Z) - GradMix: Multi-source Transfer across Domains and Tasks [33.98368732653684]
GradMix is a model-agnostic method applicable to any model trained with gradient-based learning rule.
We conduct MS-DTT experiments on two tasks: digit recognition and action recognition.
arXiv Detail & Related papers (2020-02-09T02:10:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.