A Bayesian Approach to (Online) Transfer Learning: Theory and Algorithms
- URL: http://arxiv.org/abs/2109.01377v1
- Date: Fri, 3 Sep 2021 08:43:29 GMT
- Title: A Bayesian Approach to (Online) Transfer Learning: Theory and Algorithms
- Authors: Xuetong Wu, Jonathan H. Manton, Uwe Aickelin, Jingge Zhu
- Abstract summary: We study transfer learning from a Bayesian perspective, where a parametric statistical model is used.
Specifically, we study three variants of transfer learning problems, instantaneous, online, and time-variant transfer learning.
For each problem, we define an appropriate objective function, and provide either exact expressions or upper bounds on the learning performance.
Examples show that the derived bounds are accurate even for small sample sizes.
- Score: 6.193838300896449
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transfer learning is a machine learning paradigm where knowledge from one
problem is utilized to solve a new but related problem. On the one hand, it is
conceivable that knowledge from one task could be useful for solving a related
task. On the other hand, it is also recognized that if not executed properly,
transfer learning algorithms can in fact impair the learning performance
instead of improving it - commonly known as negative transfer. In this paper,
we study transfer learning from a Bayesian perspective, where a parametric
statistical model is used. Specifically, we study three variants of transfer
learning problems, instantaneous, online, and time-variant transfer learning.
For each problem, we define an appropriate objective function, and provide
either exact expressions or upper bounds on the learning performance using
information-theoretic quantities, which allow simple and explicit
characterizations when the sample size becomes large. Furthermore, examples
show that the derived bounds are accurate even for small sample sizes. The
obtained bounds give valuable insights on the effect of prior knowledge for
transfer learning in our formulation. In particular, we formally characterize
the conditions under which negative transfer occurs. Lastly, we devise two
(online) transfer learning algorithms that are amenable to practical
implementations. Specifically, one algorithm does not require the parametric
assumption, thus extending our results to more general models. We demonstrate
the effectiveness of our algorithms with real data set, especially when the
source and target data have a strong similarity.
Related papers
- Deep invariant networks with differentiable augmentation layers [87.22033101185201]
Methods for learning data augmentation policies require held-out data and are based on bilevel optimization problems.
We show that our approach is easier and faster to train than modern automatic data augmentation techniques.
arXiv Detail & Related papers (2022-02-04T14:12:31Z) - Simple Stochastic and Online Gradient DescentAlgorithms for Pairwise
Learning [65.54757265434465]
Pairwise learning refers to learning tasks where the loss function depends on a pair instances.
Online descent (OGD) is a popular approach to handle streaming data in pairwise learning.
In this paper, we propose simple and online descent to methods for pairwise learning.
arXiv Detail & Related papers (2021-11-23T18:10:48Z) - Online Transfer Learning: Negative Transfer and Effect of Prior
Knowledge [6.193838300896449]
We study the online transfer learning problems where the source samples are given in an offline way while the target samples arrive sequentially.
We define the expected regret of the online transfer learning problem and provide upper bounds on the regret using information-theoretic quantities.
Examples show that the derived bounds are accurate even for small sample sizes.
arXiv Detail & Related papers (2021-05-04T12:12:14Z) - Low-Regret Active learning [64.36270166907788]
We develop an online learning algorithm for identifying unlabeled data points that are most informative for training.
At the core of our work is an efficient algorithm for sleeping experts that is tailored to achieve low regret on predictable (easy) instances.
arXiv Detail & Related papers (2021-04-06T22:53:45Z) - Towards Accurate Knowledge Transfer via Target-awareness Representation
Disentanglement [56.40587594647692]
We propose a novel transfer learning algorithm, introducing the idea of Target-awareness REpresentation Disentanglement (TRED)
TRED disentangles the relevant knowledge with respect to the target task from the original source model and used as a regularizer during fine-tuning the target model.
Experiments on various real world datasets show that our method stably improves the standard fine-tuning by more than 2% in average.
arXiv Detail & Related papers (2020-10-16T17:45:08Z) - What is being transferred in transfer learning? [51.6991244438545]
We show that when training from pre-trained weights, the model stays in the same basin in the loss landscape.
We present that when training from pre-trained weights, the model stays in the same basin in the loss landscape and different instances of such model are similar in feature space and close in parameter space.
arXiv Detail & Related papers (2020-08-26T17:23:40Z) - Limits of Transfer Learning [0.0]
We show the need to carefully select which sets of information to transfer and the need for dependence between transferred information and target problems.
These results build on the algorithmic search framework for machine learning, allowing the results to apply to a wide range of learning problems using transfer.
arXiv Detail & Related papers (2020-06-23T01:48:23Z) - On the Robustness of Active Learning [0.7340017786387767]
Active Learning is concerned with how to identify the most useful samples for a Machine Learning algorithm to be trained with.
We find that it is often applied with not enough care and domain knowledge.
We propose the new "Sum of Squared Logits" method based on the Simpson diversity index and investigate the effect of using the confusion matrix for balancing in sample selection.
arXiv Detail & Related papers (2020-06-18T09:07:23Z) - Minimax Lower Bounds for Transfer Learning with Linear and One-hidden
Layer Neural Networks [27.44348371795822]
We develop a statistical minimax framework to characterize the limits of transfer learning.
We derive a lower-bound for the target generalization error achievable by any algorithm as a function of the number of labeled source and target data.
arXiv Detail & Related papers (2020-06-16T22:49:26Z) - Provable Meta-Learning of Linear Representations [114.656572506859]
We provide fast, sample-efficient algorithms to address the dual challenges of learning a common set of features from multiple, related tasks, and transferring this knowledge to new, unseen tasks.
We also provide information-theoretic lower bounds on the sample complexity of learning these linear features.
arXiv Detail & Related papers (2020-02-26T18:21:34Z) - Inter- and Intra-domain Knowledge Transfer for Related Tasks in Deep
Character Recognition [2.320417845168326]
Pre-training a deep neural network on the ImageNet dataset is a common practice for training deep learning models.
The technique of pre-training on one task and then retraining on a new one is called transfer learning.
In this paper we analyse the effectiveness of using deep transfer learning for character recognition tasks.
arXiv Detail & Related papers (2020-01-02T14:18:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.