Related papers: An information-Theoretic Approach to Semi-supervised Transfer Learning

An information-Theoretic Approach to Semi-supervised Transfer Learning

URL: http://arxiv.org/abs/2306.06731v1
Date: Sun, 11 Jun 2023 17:45:46 GMT
Title: An information-Theoretic Approach to Semi-supervised Transfer Learning
Authors: Daniel Jakubovitz, David Uliel, Miguel Rodrigues, Raja Giryes
Abstract summary: Transfer learning allows propagating information from one "source dataset" to another "target dataset" discrepancies between the underlying distributions of the source and target data are commonplace. We suggest novel information-theoretic approaches for the analysis of the performance of deep neural networks in the context of transfer learning.
Score: 33.89602092349131
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Transfer learning is a valuable tool in deep learning as it allows propagating information from one "source dataset" to another "target dataset", especially in the case of a small number of training examples in the latter. Yet, discrepancies between the underlying distributions of the source and target data are commonplace and are known to have a substantial impact on algorithm performance. In this work we suggest novel information-theoretic approaches for the analysis of the performance of deep neural networks in the context of transfer learning. We focus on the task of semi-supervised transfer learning, in which unlabeled samples from the target dataset are available during network training on the source dataset. Our theory suggests that one may improve the transferability of a deep neural network by incorporating regularization terms on the target data based on information-theoretic quantities, namely the Mutual Information and the Lautum Information. We demonstrate the effectiveness of the proposed approaches in various semi-supervised transfer learning experiments.

Related papers

Transfer Learning in Infinite Width Feature Learning Networks [35.95321041944522]
We develop a theory of transfer learning in infinitely wide neural networks where both the pretraining (source) and downstream (target) task can operate in a feature learning regime.<n>We analyze both the Bayesian framework, where learning is described by a posterior distribution over the weights, and gradient flow training of randomly gradient networks trained with weight decay.<n>The summary statistics of these theories are adapted feature kernels which, after transfer learning, depend on data and labels from both source and target tasks.
arXiv Detail & Related papers (2025-07-06T16:14:43Z)
Features are fate: a theory of transfer learning in high-dimensional regression [23.840251319669907]
We show that when the target task is well represented by the feature space of the pre-trained model, transfer learning outperforms training from scratch. For this model, we establish rigorously that when the feature space overlap between the source and target tasks is sufficiently strong, both linear transfer and fine-tuning improve performance.
arXiv Detail & Related papers (2024-10-10T17:58:26Z)
Optimal transfer protocol by incremental layer defrosting [66.76153955485584]
Transfer learning is a powerful tool enabling model training with limited amounts of data. The simplest transfer learning protocol is based on freezing" the feature-extractor layers of a network pre-trained on a data-rich source task. We show that this protocol is often sub-optimal and the largest performance gain may be achieved when smaller portions of the pre-trained network are kept frozen.
arXiv Detail & Related papers (2023-03-02T17:32:11Z)
An Exploration of Data Efficiency in Intra-Dataset Task Transfer for Dialog Understanding [65.75873687351553]
This study explores the effects of varying quantities of target task training data on sequential transfer learning in the dialog domain. Unintuitively, our data shows that often target task training data size has minimal effect on how sequential transfer learning performs compared to the same model without transfer learning.
arXiv Detail & Related papers (2022-10-21T04:36:46Z)
A Data-Based Perspective on Transfer Learning [76.30206800557411]
We take a closer look at the role of the source dataset's composition in transfer learning. Our framework gives rise to new capabilities such as pinpointing transfer learning brittleness.
arXiv Detail & Related papers (2022-07-12T17:58:28Z)
Initial Study into Application of Feature Density and Linguistically-backed Embedding to Improve Machine Learning-based Cyberbullying Detection [54.83707803301847]
The research was conducted on a Formspring dataset provided in a Kaggle competition on automatic cyberbullying detection. The study confirmed the effectiveness of Neural Networks in cyberbullying detection and the correlation between classifier performance and Feature Density.
arXiv Detail & Related papers (2022-06-04T03:17:15Z)
Learning Bias-Invariant Representation by Cross-Sample Mutual Information Minimization [77.8735802150511]
We propose a cross-sample adversarial debiasing (CSAD) method to remove the bias information misused by the target task. The correlation measurement plays a critical role in adversarial debiasing and is conducted by a cross-sample neural mutual information estimator. We conduct thorough experiments on publicly available datasets to validate the advantages of the proposed method over state-of-the-art approaches.
arXiv Detail & Related papers (2021-08-11T21:17:02Z)
Probing transfer learning with a model of synthetic correlated datasets [11.53207294639557]
Transfer learning can significantly improve the sample efficiency of neural networks. We re-think a solvable model of synthetic data as a framework for modeling correlation between data-sets. We show that our model can capture a range of salient features of transfer learning with real data.
arXiv Detail & Related papers (2021-06-09T22:15:41Z)
A Concise Review of Transfer Learning [1.5771347525430772]
Transfer learning aims to boost the performance of a target learner by applying another related source data. Traditional machine learning and data mining techniques assume that the training and testing data lie from the same feature space and distribution.
arXiv Detail & Related papers (2021-04-05T20:34:55Z)
Towards Accurate Knowledge Transfer via Target-awareness Representation Disentanglement [56.40587594647692]
We propose a novel transfer learning algorithm, introducing the idea of Target-awareness REpresentation Disentanglement (TRED) TRED disentangles the relevant knowledge with respect to the target task from the original source model and used as a regularizer during fine-tuning the target model. Experiments on various real world datasets show that our method stably improves the standard fine-tuning by more than 2% in average.
arXiv Detail & Related papers (2020-10-16T17:45:08Z)
Minimax Lower Bounds for Transfer Learning with Linear and One-hidden Layer Neural Networks [27.44348371795822]
We develop a statistical minimax framework to characterize the limits of transfer learning. We derive a lower-bound for the target generalization error achievable by any algorithm as a function of the number of labeled source and target data.
arXiv Detail & Related papers (2020-06-16T22:49:26Z)
The Utility of Feature Reuse: Transfer Learning in Data-Starved Regimes [6.419457653976053]
We describe a transfer learning use case for a domain with a data-starved regime. We evaluate the effectiveness of convolutional feature extraction and fine-tuning. We conclude that transfer learning enhances the performance of CNN architectures in data-starved regimes.
arXiv Detail & Related papers (2020-02-29T18:48:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.