An Information-Theoretic Analysis for Transfer Learning: Error Bounds
and Applications
- URL: http://arxiv.org/abs/2207.05377v1
- Date: Tue, 12 Jul 2022 08:20:41 GMT
- Title: An Information-Theoretic Analysis for Transfer Learning: Error Bounds
and Applications
- Authors: Xuetong Wu, Jonathan H. Manton, Uwe Aickelin, Jingge Zhu
- Abstract summary: We give an information-theoretic analysis on the generalization error and excess risk of transfer learning algorithms.
Our results suggest, perhaps as expected, that the Kullback-Leibler divergence $D(mu||mu')$ plays an important role in the characterizations.
Inspired by the derived bounds, we propose the InfoBoost algorithm in which the importance weights for source and target data are adjusted adaptively.
- Score: 5.081241420920605
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transfer learning, or domain adaptation, is concerned with machine learning
problems in which training and testing data come from possibly different
probability distributions. In this work, we give an information-theoretic
analysis on the generalization error and excess risk of transfer learning
algorithms, following a line of work initiated by Russo and Xu. Our results
suggest, perhaps as expected, that the Kullback-Leibler (KL) divergence
$D(\mu||\mu')$ plays an important role in the characterizations where $\mu$ and
$\mu'$ denote the distribution of the training data and the testing test,
respectively. Specifically, we provide generalization error upper bounds for
the empirical risk minimization (ERM) algorithm where data from both
distributions are available in the training phase. We further apply the
analysis to approximated ERM methods such as the Gibbs algorithm and the
stochastic gradient descent method. We then generalize the mutual information
bound with $\phi$-divergence and Wasserstein distance. These generalizations
lead to tighter bounds and can handle the case when $\mu$ is not absolutely
continuous with respect to $\mu'$. Furthermore, we apply a new set of
techniques to obtain an alternative upper bound which gives a fast (and
optimal) learning rate for some learning problems. Finally, inspired by the
derived bounds, we propose the InfoBoost algorithm in which the importance
weights for source and target data are adjusted adaptively in accordance to
information measures. The empirical results show the effectiveness of the
proposed algorithm.
Related papers
- On the Performance of Empirical Risk Minimization with Smoothed Data [59.3428024282545]
Empirical Risk Minimization (ERM) is able to achieve sublinear error whenever a class is learnable with iid data.
We show that ERM is able to achieve sublinear error whenever a class is learnable with iid data.
arXiv Detail & Related papers (2024-02-22T21:55:41Z) - Distributionally Robust Skeleton Learning of Discrete Bayesian Networks [9.46389554092506]
We consider the problem of learning the exact skeleton of general discrete Bayesian networks from potentially corrupted data.
We propose to optimize the most adverse risk over a family of distributions within bounded Wasserstein distance or KL divergence to the empirical distribution.
We present efficient algorithms and show the proposed methods are closely related to the standard regularized regression approach.
arXiv Detail & Related papers (2023-11-10T15:33:19Z) - Hypothesis Transfer Learning with Surrogate Classification Losses:
Generalization Bounds through Algorithmic Stability [3.908842679355255]
Hypothesis transfer learning (HTL) contrasts domain adaptation by allowing for a previous task leverage, named the source, into a new one, the target.
This paper studies the learning theory of HTL through algorithmic stability, an attractive theoretical framework for machine learning algorithms analysis.
arXiv Detail & Related papers (2023-05-31T09:38:21Z) - STEERING: Stein Information Directed Exploration for Model-Based
Reinforcement Learning [111.75423966239092]
We propose an exploration incentive in terms of the integral probability metric (IPM) between a current estimate of the transition model and the unknown optimal.
Based on KSD, we develop a novel algorithm algo: textbfSTEin information dirtextbfEcted exploration for model-based textbfReinforcement LearntextbfING.
arXiv Detail & Related papers (2023-01-28T00:49:28Z) - Learning to Bound Counterfactual Inference in Structural Causal Models
from Observational and Randomised Data [64.96984404868411]
We derive a likelihood characterisation for the overall data that leads us to extend a previous EM-based algorithm.
The new algorithm learns to approximate the (unidentifiability) region of model parameters from such mixed data sources.
It delivers interval approximations to counterfactual results, which collapse to points in the identifiable case.
arXiv Detail & Related papers (2022-12-06T12:42:11Z) - Learning Algorithm Generalization Error Bounds via Auxiliary Distributions [16.44492672878356]
Generalization error bounds are essential for comprehending how well machine learning models work.
In this work, we suggest a novel method, i.e., the Auxiliary Distribution Method, that leads to new upper bounds on expected generalization errors.
arXiv Detail & Related papers (2022-10-02T10:37:04Z) - On Leave-One-Out Conditional Mutual Information For Generalization [122.2734338600665]
We derive information theoretic generalization bounds for supervised learning algorithms based on a new measure of leave-one-out conditional mutual information (loo-CMI)
Contrary to other CMI bounds, our loo-CMI bounds can be computed easily and can be interpreted in connection to other notions such as classical leave-one-out cross-validation.
We empirically validate the quality of the bound by evaluating its predicted generalization gap in scenarios for deep learning.
arXiv Detail & Related papers (2022-07-01T17:58:29Z) - Transfer Learning under High-dimensional Generalized Linear Models [7.675822266933702]
We study the transfer learning problem under high-dimensional generalized linear models.
We propose an oracle algorithm and derive its $ell$-estimation error bounds.
When we don't know which sources to transfer, an algorithm-free transferable source detection approach is introduced.
arXiv Detail & Related papers (2021-05-29T15:39:43Z) - Graph Embedding with Data Uncertainty [113.39838145450007]
spectral-based subspace learning is a common data preprocessing step in many machine learning pipelines.
Most subspace learning methods do not take into consideration possible measurement inaccuracies or artifacts that can lead to data with high uncertainty.
arXiv Detail & Related papers (2020-09-01T15:08:23Z) - Learning while Respecting Privacy and Robustness to Distributional
Uncertainties and Adversarial Data [66.78671826743884]
The distributionally robust optimization framework is considered for training a parametric model.
The objective is to endow the trained model with robustness against adversarially manipulated input data.
Proposed algorithms offer robustness with little overhead.
arXiv Detail & Related papers (2020-07-07T18:25:25Z) - Information-theoretic analysis for transfer learning [5.081241420920605]
We give an information-theoretic analysis on the generalization error and the excess risk of transfer learning algorithms.
Our results suggest, perhaps as expected, that the Kullback-Leibler divergence $D(mu||mu')$ plays an important role in characterizing the generalization error.
arXiv Detail & Related papers (2020-05-18T13:23:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.