Related papers: Transfer Learning with Pre-trained Conditional Generative Models

Transfer Learning with Pre-trained Conditional Generative Models

URL: http://arxiv.org/abs/2204.12833v1
Date: Wed, 27 Apr 2022 10:36:32 GMT
Title: Transfer Learning with Pre-trained Conditional Generative Models
Authors: Shin'ya Yamaguchi, Sekitoshi Kanai, Atsutoshi Kumagai, Daiki Chijiwa, Hisashi Kashima
Abstract summary: We propose a transfer learning method that uses deep generative models and is composed of the following two stages: pseudo pre-training and pseudo semi-supervised learning. Our experimental results indicate that our method can outperform baselines of scratch training and knowledge distillation.
Score: 29.43740987925133
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Transfer learning is crucial in training deep neural networks on new target tasks. Current transfer learning methods generally assume at least one of (i) source and target task label spaces must overlap, (ii) source datasets are available, and (iii) target network architectures are consistent with source ones. However, these all assumptions are difficult to hold in practical settings because the target task rarely has the same labels as the source task, the source dataset access is restricted due to licensing and storage costs, and the target architecture is often specialized to each task. To transfer source knowledge without these assumptions, we propose a transfer learning method that uses deep generative models and is composed of the following two stages: pseudo pre-training (PP) and pseudo semi-supervised learning (P-SSL). PP trains a target architecture with a synthesized dataset by using conditional source generative models. P-SSL applies SSL algorithms to labeled target data and unlabeled pseudo samples, which are generated by cascading the source classifier and generative models to condition them with target samples. Our experimental results indicate that our method can outperform baselines of scratch training and knowledge distillation.

Related papers

Private Training & Data Generation by Clustering Embeddings [74.00687214400021]
Differential privacy (DP) provides a robust framework for protecting individual data.<n>We introduce a novel principled method for DP synthetic image embedding generation.<n> Empirically, a simple two-layer neural network trained on synthetically generated embeddings achieves state-of-the-art (SOTA) classification accuracy.
arXiv Detail & Related papers (2025-06-20T00:17:14Z)
Source-Free Domain Adaptation Guided by Vision and Vision-Language Pre-Training [23.56208527227504]
Source-free domain adaptation (SFDA) aims to adapt a source model trained on a fully-labeled source domain to a related but unlabeled target domain. In the conventional SFDA pipeline, a large data (e.g. ImageNet) pre-trained feature extractor is used to initialize the source model. We introduce an integrated framework to incorporate pre-trained networks into the target adaptation process.
arXiv Detail & Related papers (2024-05-05T14:48:13Z)
Diffusion-Based Neural Network Weights Generation [80.89706112736353]
D2NWG is a diffusion-based neural network weights generation technique that efficiently produces high-performing weights for transfer learning. Our method extends generative hyper-representation learning to recast the latent diffusion paradigm for neural network weights generation. Our approach is scalable to large architectures such as large language models (LLMs), overcoming the limitations of current parameter generation techniques.
arXiv Detail & Related papers (2024-02-28T08:34:23Z)
netFound: Foundation Model for Network Security [11.38388749887112]
This paper introduces a novel transformer-based network foundation model, netFound. We employ self-supervised learning techniques on abundant, unlabeled network telemetry data for pre-training. Our results demonstrate that netFound effectively captures the hidden networking context in production settings.
arXiv Detail & Related papers (2023-10-25T22:04:57Z)
Optimal transfer protocol by incremental layer defrosting [66.76153955485584]
Transfer learning is a powerful tool enabling model training with limited amounts of data. The simplest transfer learning protocol is based on freezing" the feature-extractor layers of a network pre-trained on a data-rich source task. We show that this protocol is often sub-optimal and the largest performance gain may be achieved when smaller portions of the pre-trained network are kept frozen.
arXiv Detail & Related papers (2023-03-02T17:32:11Z)
Deep transfer learning for partial differential equations under conditional shift with DeepONet [0.0]
We propose a novel TL framework for task-specific learning under conditional shift with a deep operator network (DeepONet) Inspired by the conditional embedding operator theory, we measure the statistical distance between the source domain and the target feature domain. We show that the proposed TL framework enables fast and efficient multi-task operator learning, despite significant differences between the source and target domains.
arXiv Detail & Related papers (2022-04-20T23:23:38Z)
Unified Instance and Knowledge Alignment Pretraining for Aspect-based Sentiment Analysis [96.53859361560505]
Aspect-based Sentiment Analysis (ABSA) aims to determine the sentiment polarity towards an aspect. There always exists severe domain shift between the pretraining and downstream ABSA datasets. We introduce a unified alignment pretraining framework into the vanilla pretrain-finetune pipeline.
arXiv Detail & Related papers (2021-10-26T04:03:45Z)
Trash to Treasure: Harvesting OOD Data with Cross-Modal Matching for Open-Set Semi-Supervised Learning [101.28281124670647]
Open-set semi-supervised learning (open-set SSL) investigates a challenging but practical scenario where out-of-distribution (OOD) samples are contained in the unlabeled data. We propose a novel training mechanism that could effectively exploit the presence of OOD data for enhanced feature learning. Our approach substantially lifts the performance on open-set SSL and outperforms the state-of-the-art by a large margin.
arXiv Detail & Related papers (2021-08-12T09:14:44Z)
Source Data-absent Unsupervised Domain Adaptation through Hypothesis Transfer and Labeling Transfer [137.36099660616975]
Unsupervised adaptation adaptation (UDA) aims to transfer knowledge from a related but different well-labeled source domain to a new unlabeled target domain. Most existing UDA methods require access to the source data, and thus are not applicable when the data are confidential and not shareable due to privacy concerns. This paper aims to tackle a realistic setting with only a classification model available trained over, instead of accessing to the source data.
arXiv Detail & Related papers (2020-12-14T07:28:50Z)
Effectiveness of Arbitrary Transfer Sets for Data-free Knowledge Distillation [28.874162427052905]
We investigate the effectiveness of "arbitrary transfer sets" such as random noise, publicly available synthetic, and natural datasets. We find surprising effectiveness of using arbitrary data to conduct knowledge distillation when this dataset is "target-class balanced"
arXiv Detail & Related papers (2020-11-18T06:33:20Z)
Towards Accurate Knowledge Transfer via Target-awareness Representation Disentanglement [56.40587594647692]
We propose a novel transfer learning algorithm, introducing the idea of Target-awareness REpresentation Disentanglement (TRED) TRED disentangles the relevant knowledge with respect to the target task from the original source model and used as a regularizer during fine-tuning the target model. Experiments on various real world datasets show that our method stably improves the standard fine-tuning by more than 2% in average.
arXiv Detail & Related papers (2020-10-16T17:45:08Z)
Minimax Lower Bounds for Transfer Learning with Linear and One-hidden Layer Neural Networks [27.44348371795822]
We develop a statistical minimax framework to characterize the limits of transfer learning. We derive a lower-bound for the target generalization error achievable by any algorithm as a function of the number of labeled source and target data.
arXiv Detail & Related papers (2020-06-16T22:49:26Z)
GradMix: Multi-source Transfer across Domains and Tasks [33.98368732653684]
GradMix is a model-agnostic method applicable to any model trained with gradient-based learning rule. We conduct MS-DTT experiments on two tasks: digit recognition and action recognition.
arXiv Detail & Related papers (2020-02-09T02:10:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.