Sample-based Regularization: A Transfer Learning Strategy Toward Better
Generalization
- URL: http://arxiv.org/abs/2007.05181v1
- Date: Fri, 10 Jul 2020 06:02:05 GMT
- Title: Sample-based Regularization: A Transfer Learning Strategy Toward Better
Generalization
- Authors: Yunho Jeon, Yongseok Choi, Jaesun Park, Subin Yi, Dongyeon Cho, Jiwon
Kim
- Abstract summary: Training a deep neural network with a small amount of data is a challenging problem.
One of the practical difficulties that we often face is to collect many samples.
By using the source model trained with a large-scale dataset, the target model can alleviate the overfitting originated from the lack of training data.
- Score: 8.432864879027724
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Training a deep neural network with a small amount of data is a challenging
problem as it is vulnerable to overfitting. However, one of the practical
difficulties that we often face is to collect many samples. Transfer learning
is a cost-effective solution to this problem. By using the source model trained
with a large-scale dataset, the target model can alleviate the overfitting
originated from the lack of training data. Resorting to the ability of
generalization of the source model, several methods proposed to use the source
knowledge during the whole training procedure. However, this is likely to
restrict the potential of the target model and some transferred knowledge from
the source can interfere with the training procedure. For improving the
generalization performance of the target model with a few training samples, we
proposed a regularization method called sample-based regularization (SBR),
which does not rely on the source's knowledge during training. With SBR, we
suggested a new training framework for transfer learning. Experimental results
showed that our framework outperformed existing methods in various
configurations.
Related papers
- Model-Based Transfer Learning for Contextual Reinforcement Learning [5.5597941107270215]
We show how to systematically select good tasks to train, maximizing overall performance across a range of tasks.
Key idea behind our approach is to explicitly model the performance loss incurred by transferring a trained model.
We experimentally validate our methods using urban traffic and standard control benchmarks.
arXiv Detail & Related papers (2024-08-08T14:46:01Z) - Data Adaptive Traceback for Vision-Language Foundation Models in Image Classification [34.37262622415682]
We propose a new adaptation framework called Data Adaptive Traceback.
Specifically, we utilize a zero-shot-based method to extract the most downstream task-related subset of the pre-training data.
We adopt a pseudo-label-based semi-supervised technique to reuse the pre-training images and a vision-language contrastive learning method to address the confirmation bias issue in semi-supervised learning.
arXiv Detail & Related papers (2024-07-11T18:01:58Z) - Diffusion-Based Neural Network Weights Generation [80.89706112736353]
D2NWG is a diffusion-based neural network weights generation technique that efficiently produces high-performing weights for transfer learning.
Our method extends generative hyper-representation learning to recast the latent diffusion paradigm for neural network weights generation.
Our approach is scalable to large architectures such as large language models (LLMs), overcoming the limitations of current parameter generation techniques.
arXiv Detail & Related papers (2024-02-28T08:34:23Z) - Take the Bull by the Horns: Hard Sample-Reweighted Continual Training
Improves LLM Generalization [165.98557106089777]
A key challenge is to enhance the capabilities of large language models (LLMs) amid a looming shortage of high-quality training data.
Our study starts from an empirical strategy for the light continual training of LLMs using their original pre-training data sets.
We then formalize this strategy into a principled framework of Instance-Reweighted Distributionally Robust Optimization.
arXiv Detail & Related papers (2024-02-22T04:10:57Z) - Back to Basics: A Simple Recipe for Improving Out-of-Domain Retrieval in
Dense Encoders [63.28408887247742]
We study whether training procedures can be improved to yield better generalization capabilities in the resulting models.
We recommend a simple recipe for training dense encoders: Train on MSMARCO with parameter-efficient methods, such as LoRA, and opt for using in-batch negatives unless given well-constructed hard negatives.
arXiv Detail & Related papers (2023-11-16T10:42:58Z) - Fantastic Gains and Where to Find Them: On the Existence and Prospect of
General Knowledge Transfer between Any Pretrained Model [74.62272538148245]
We show that for arbitrary pairings of pretrained models, one model extracts significant data context unavailable in the other.
We investigate if it is possible to transfer such "complementary" knowledge from one model to another without performance degradation.
arXiv Detail & Related papers (2023-10-26T17:59:46Z) - Consistency Regularization for Generalizable Source-free Domain
Adaptation [62.654883736925456]
Source-free domain adaptation (SFDA) aims to adapt a well-trained source model to an unlabelled target domain without accessing the source dataset.
Existing SFDA methods ONLY assess their adapted models on the target training set, neglecting the data from unseen but identically distributed testing sets.
We propose a consistency regularization framework to develop a more generalizable SFDA method.
arXiv Detail & Related papers (2023-08-03T07:45:53Z) - Friendly Training: Neural Networks Can Adapt Data To Make Learning
Easier [23.886422706697882]
We propose a novel training procedure named Friendly Training.
We show that Friendly Training yields improvements with respect to informed data sub-selection and random selection.
Results suggest that adapting the input data is a feasible way to stabilize learning and improve the skills generalization of the network.
arXiv Detail & Related papers (2021-06-21T10:50:34Z) - Minimax Lower Bounds for Transfer Learning with Linear and One-hidden
Layer Neural Networks [27.44348371795822]
We develop a statistical minimax framework to characterize the limits of transfer learning.
We derive a lower-bound for the target generalization error achievable by any algorithm as a function of the number of labeled source and target data.
arXiv Detail & Related papers (2020-06-16T22:49:26Z) - Learning Diverse Representations for Fast Adaptation to Distribution
Shift [78.83747601814669]
We present a method for learning multiple models, incorporating an objective that pressures each to learn a distinct way to solve the task.
We demonstrate our framework's ability to facilitate rapid adaptation to distribution shift.
arXiv Detail & Related papers (2020-06-12T12:23:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.