GradMix: Multi-source Transfer across Domains and Tasks
- URL: http://arxiv.org/abs/2002.03264v1
- Date: Sun, 9 Feb 2020 02:10:22 GMT
- Title: GradMix: Multi-source Transfer across Domains and Tasks
- Authors: Junnan Li, Ziwei Xu, Yongkang Wong, Qi Zhao, Mohan Kankanhalli
- Abstract summary: GradMix is a model-agnostic method applicable to any model trained with gradient-based learning rule.
We conduct MS-DTT experiments on two tasks: digit recognition and action recognition.
- Score: 33.98368732653684
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The computer vision community is witnessing an unprecedented rate of new
tasks being proposed and addressed, thanks to the deep convolutional networks'
capability to find complex mappings from X to Y. The advent of each task often
accompanies the release of a large-scale annotated dataset, for supervised
training of deep network. However, it is expensive and time-consuming to
manually label sufficient amount of training data. Therefore, it is important
to develop algorithms that can leverage off-the-shelf labeled dataset to learn
useful knowledge for the target task. While previous works mostly focus on
transfer learning from a single source, we study multi-source transfer across
domains and tasks (MS-DTT), in a semi-supervised setting. We propose GradMix, a
model-agnostic method applicable to any model trained with gradient-based
learning rule, to transfer knowledge via gradient descent by weighting and
mixing the gradients from all sources during training. GradMix follows a
meta-learning objective, which assigns layer-wise weights to the source
gradients, such that the combined gradient follows the direction that minimize
the loss for a small set of samples from the target dataset. In addition, we
propose to adaptively adjust the learning rate for each mini-batch based on its
importance to the target task, and a pseudo-labeling method to leverage the
unlabeled samples in the target domain. We conduct MS-DTT experiments on two
tasks: digit recognition and action recognition, and demonstrate the
advantageous performance of the proposed method against multiple baselines.
Related papers
- Bayesian Uncertainty for Gradient Aggregation in Multi-Task Learning [39.4348419684885]
Multi-task learning (MTL) aims at learning a single model that solves several tasks efficiently.
We introduce a novel gradient aggregation approach using Bayesian inference.
We empirically demonstrate the benefits of our approach in a variety of datasets.
arXiv Detail & Related papers (2024-02-06T14:00:43Z) - Task-Distributionally Robust Data-Free Meta-Learning [99.56612787882334]
Data-Free Meta-Learning (DFML) aims to efficiently learn new tasks by leveraging multiple pre-trained models without requiring their original training data.
For the first time, we reveal two major challenges hindering their practical deployments: Task-Distribution Shift ( TDS) and Task-Distribution Corruption (TDC)
arXiv Detail & Related papers (2023-11-23T15:46:54Z) - Optimal transfer protocol by incremental layer defrosting [66.76153955485584]
Transfer learning is a powerful tool enabling model training with limited amounts of data.
The simplest transfer learning protocol is based on freezing" the feature-extractor layers of a network pre-trained on a data-rich source task.
We show that this protocol is often sub-optimal and the largest performance gain may be achieved when smaller portions of the pre-trained network are kept frozen.
arXiv Detail & Related papers (2023-03-02T17:32:11Z) - Improving Meta-Learning Generalization with Activation-Based
Early-Stopping [12.299371455015239]
Meta-Learning algorithms for few-shot learning aim to train neural networks capable of generalizing to novel tasks using only a few examples.
Early-stopping is critical for performance, halting model training when it reaches optimal generalization to the new task distribution.
This is problematic in few-shot transfer learning settings, where the meta-test set comes from a different target dataset.
arXiv Detail & Related papers (2022-08-03T22:55:45Z) - Continual Coarse-to-Fine Domain Adaptation in Semantic Segmentation [22.366638308792734]
Deep neural networks are typically trained in a single shot for a specific task and data distribution.
In real world settings both the task and the domain of application can change.
We introduce the novel task of coarse-to-fine learning of semantic segmentation architectures in presence of domain shift.
arXiv Detail & Related papers (2022-01-18T13:31:19Z) - Meta-Learning with Fewer Tasks through Task Interpolation [67.03769747726666]
Current meta-learning algorithms require a large number of meta-training tasks, which may not be accessible in real-world scenarios.
By meta-learning with task gradient (MLTI), our approach effectively generates additional tasks by randomly sampling a pair of tasks and interpolating the corresponding features and labels.
Empirically, in our experiments on eight datasets from diverse domains, we find that the proposed general MLTI framework is compatible with representative meta-learning algorithms and consistently outperforms other state-of-the-art strategies.
arXiv Detail & Related papers (2021-06-04T20:15:34Z) - Energy-Efficient and Federated Meta-Learning via Projected Stochastic
Gradient Ascent [79.58680275615752]
We propose an energy-efficient federated meta-learning framework.
We assume each task is owned by a separate agent, so a limited number of tasks is used to train a meta-model.
arXiv Detail & Related papers (2021-05-31T08:15:44Z) - Towards Accurate Knowledge Transfer via Target-awareness Representation
Disentanglement [56.40587594647692]
We propose a novel transfer learning algorithm, introducing the idea of Target-awareness REpresentation Disentanglement (TRED)
TRED disentangles the relevant knowledge with respect to the target task from the original source model and used as a regularizer during fine-tuning the target model.
Experiments on various real world datasets show that our method stably improves the standard fine-tuning by more than 2% in average.
arXiv Detail & Related papers (2020-10-16T17:45:08Z) - XMixup: Efficient Transfer Learning with Auxiliary Samples by
Cross-domain Mixup [60.07531696857743]
Cross-domain Mixup (XMixup) improves the multitask paradigm for deep transfer learning.
XMixup selects the auxiliary samples from the source dataset and augments training samples via the simple mixup strategy.
Experiment results show that XMixup improves the accuracy by 1.9% on average.
arXiv Detail & Related papers (2020-07-20T16:42:29Z) - Minimax Lower Bounds for Transfer Learning with Linear and One-hidden
Layer Neural Networks [27.44348371795822]
We develop a statistical minimax framework to characterize the limits of transfer learning.
We derive a lower-bound for the target generalization error achievable by any algorithm as a function of the number of labeled source and target data.
arXiv Detail & Related papers (2020-06-16T22:49:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.