Transfer Learning with Pre-trained Conditional Generative Models
- URL: http://arxiv.org/abs/2204.12833v1
- Date: Wed, 27 Apr 2022 10:36:32 GMT
- Title: Transfer Learning with Pre-trained Conditional Generative Models
- Authors: Shin'ya Yamaguchi, Sekitoshi Kanai, Atsutoshi Kumagai, Daiki Chijiwa,
Hisashi Kashima
- Abstract summary: We propose a transfer learning method that uses deep generative models and is composed of the following two stages: pseudo pre-training and pseudo semi-supervised learning.
Our experimental results indicate that our method can outperform baselines of scratch training and knowledge distillation.
- Score: 29.43740987925133
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transfer learning is crucial in training deep neural networks on new target
tasks. Current transfer learning methods generally assume at least one of (i)
source and target task label spaces must overlap, (ii) source datasets are
available, and (iii) target network architectures are consistent with source
ones. However, these all assumptions are difficult to hold in practical
settings because the target task rarely has the same labels as the source task,
the source dataset access is restricted due to licensing and storage costs, and
the target architecture is often specialized to each task. To transfer source
knowledge without these assumptions, we propose a transfer learning method that
uses deep generative models and is composed of the following two stages: pseudo
pre-training (PP) and pseudo semi-supervised learning (P-SSL). PP trains a
target architecture with a synthesized dataset by using conditional source
generative models. P-SSL applies SSL algorithms to labeled target data and
unlabeled pseudo samples, which are generated by cascading the source
classifier and generative models to condition them with target samples. Our
experimental results indicate that our method can outperform baselines of
scratch training and knowledge distillation.
Related papers
- Source-Free Domain Adaptation Guided by Vision and Vision-Language Pre-Training [23.56208527227504]
Source-free domain adaptation (SFDA) aims to adapt a source model trained on a fully-labeled source domain to a related but unlabeled target domain.
In the conventional SFDA pipeline, a large data (e.g. ImageNet) pre-trained feature extractor is used to initialize the source model.
We introduce an integrated framework to incorporate pre-trained networks into the target adaptation process.
arXiv Detail & Related papers (2024-05-05T14:48:13Z) - Diffusion-Based Neural Network Weights Generation [80.89706112736353]
D2NWG is a diffusion-based neural network weights generation technique that efficiently produces high-performing weights for transfer learning.
Our method extends generative hyper-representation learning to recast the latent diffusion paradigm for neural network weights generation.
Our approach is scalable to large architectures such as large language models (LLMs), overcoming the limitations of current parameter generation techniques.
arXiv Detail & Related papers (2024-02-28T08:34:23Z) - netFound: Foundation Model for Network Security [11.38388749887112]
This paper introduces a novel transformer-based network foundation model, netFound.
We employ self-supervised learning techniques on abundant, unlabeled network telemetry data for pre-training.
Our results demonstrate that netFound effectively captures the hidden networking context in production settings.
arXiv Detail & Related papers (2023-10-25T22:04:57Z) - Optimal transfer protocol by incremental layer defrosting [66.76153955485584]
Transfer learning is a powerful tool enabling model training with limited amounts of data.
The simplest transfer learning protocol is based on freezing" the feature-extractor layers of a network pre-trained on a data-rich source task.
We show that this protocol is often sub-optimal and the largest performance gain may be achieved when smaller portions of the pre-trained network are kept frozen.
arXiv Detail & Related papers (2023-03-02T17:32:11Z) - Unified Instance and Knowledge Alignment Pretraining for Aspect-based
Sentiment Analysis [96.53859361560505]
Aspect-based Sentiment Analysis (ABSA) aims to determine the sentiment polarity towards an aspect.
There always exists severe domain shift between the pretraining and downstream ABSA datasets.
We introduce a unified alignment pretraining framework into the vanilla pretrain-finetune pipeline.
arXiv Detail & Related papers (2021-10-26T04:03:45Z) - Trash to Treasure: Harvesting OOD Data with Cross-Modal Matching for
Open-Set Semi-Supervised Learning [101.28281124670647]
Open-set semi-supervised learning (open-set SSL) investigates a challenging but practical scenario where out-of-distribution (OOD) samples are contained in the unlabeled data.
We propose a novel training mechanism that could effectively exploit the presence of OOD data for enhanced feature learning.
Our approach substantially lifts the performance on open-set SSL and outperforms the state-of-the-art by a large margin.
arXiv Detail & Related papers (2021-08-12T09:14:44Z) - Source Data-absent Unsupervised Domain Adaptation through Hypothesis
Transfer and Labeling Transfer [137.36099660616975]
Unsupervised adaptation adaptation (UDA) aims to transfer knowledge from a related but different well-labeled source domain to a new unlabeled target domain.
Most existing UDA methods require access to the source data, and thus are not applicable when the data are confidential and not shareable due to privacy concerns.
This paper aims to tackle a realistic setting with only a classification model available trained over, instead of accessing to the source data.
arXiv Detail & Related papers (2020-12-14T07:28:50Z) - Effectiveness of Arbitrary Transfer Sets for Data-free Knowledge
Distillation [28.874162427052905]
We investigate the effectiveness of "arbitrary transfer sets" such as random noise, publicly available synthetic, and natural datasets.
We find surprising effectiveness of using arbitrary data to conduct knowledge distillation when this dataset is "target-class balanced"
arXiv Detail & Related papers (2020-11-18T06:33:20Z) - Towards Accurate Knowledge Transfer via Target-awareness Representation
Disentanglement [56.40587594647692]
We propose a novel transfer learning algorithm, introducing the idea of Target-awareness REpresentation Disentanglement (TRED)
TRED disentangles the relevant knowledge with respect to the target task from the original source model and used as a regularizer during fine-tuning the target model.
Experiments on various real world datasets show that our method stably improves the standard fine-tuning by more than 2% in average.
arXiv Detail & Related papers (2020-10-16T17:45:08Z) - Minimax Lower Bounds for Transfer Learning with Linear and One-hidden
Layer Neural Networks [27.44348371795822]
We develop a statistical minimax framework to characterize the limits of transfer learning.
We derive a lower-bound for the target generalization error achievable by any algorithm as a function of the number of labeled source and target data.
arXiv Detail & Related papers (2020-06-16T22:49:26Z) - GradMix: Multi-source Transfer across Domains and Tasks [33.98368732653684]
GradMix is a model-agnostic method applicable to any model trained with gradient-based learning rule.
We conduct MS-DTT experiments on two tasks: digit recognition and action recognition.
arXiv Detail & Related papers (2020-02-09T02:10:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.