Related papers: On the Generalization for Transfer Learning: An Information-Theoretic Analysis

On the Generalization for Transfer Learning: An Information-Theoretic Analysis

URL: http://arxiv.org/abs/2207.05377v2
Date: Thu, 8 Aug 2024 02:10:45 GMT
Title: On the Generalization for Transfer Learning: An Information-Theoretic Analysis
Authors: Xuetong Wu, Jonathan H. Manton, Uwe Aickelin, Jingge Zhu,
Abstract summary: We give an information-theoretic analysis of the generalization error and excess risk of transfer learning algorithms. Our results suggest, perhaps as expected, that the Kullback-Leibler divergenceD(mu|mu')$ plays an important role in the characterizations. We then generalize the mutual information bound with other divergences such as $phi$-divergence and Wasserstein distance.
Score: 8.102199960821165
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Transfer learning, or domain adaptation, is concerned with machine learning problems in which training and testing data come from possibly different probability distributions. In this work, we give an information-theoretic analysis of the generalization error and excess risk of transfer learning algorithms. Our results suggest, perhaps as expected, that the Kullback-Leibler (KL) divergence $D(\mu\|\mu')$ plays an important role in the characterizations where $\mu$ and $\mu'$ denote the distribution of the training data and the testing data, respectively. Specifically, we provide generalization error and excess risk upper bounds for learning algorithms where data from both distributions are available in the training phase. Recognizing that the bounds could be sub-optimal in general, we provide improved excess risk upper bounds for a certain class of algorithms, including the empirical risk minimization (ERM) algorithm, by making stronger assumptions through the \textit{central condition}. To demonstrate the usefulness of the bounds, we further extend the analysis to the Gibbs algorithm and the noisy stochastic gradient descent method. We then generalize the mutual information bound with other divergences such as $\phi$-divergence and Wasserstein distance, which may lead to tighter bounds and can handle the case when $\mu$ is not absolutely continuous with respect to $\mu'$. Several numerical results are provided to demonstrate our theoretical findings. Lastly, to address the problem that the bounds are often not directly applicable in practice due to the absence of the distributional knowledge of the data, we develop an algorithm (called InfoBoost) that dynamically adjusts the importance weights for both source and target data based on certain information measures. The empirical results show the effectiveness of the proposed algorithm.

Related papers

Learning-Based TSP-Solvers Tend to Be Overly Greedy [8.79364699260219]
This study constructs a statistical measure called nearest-neighbor density to verify the properties of randomly generated instance of learning-based solvers. We validate that the performance of the learning-based solvers degenerates much on such augmented data. In short, we decipher the limitations of learning-based TSP solvers tending to be overly greedy, which may have profound implications for AI-empowered optimization solvers.
arXiv Detail & Related papers (2025-02-02T12:06:13Z)
On the Performance of Empirical Risk Minimization with Smoothed Data [59.3428024282545]
Empirical Risk Minimization (ERM) is able to achieve sublinear error whenever a class is learnable with iid data. We show that ERM is able to achieve sublinear error whenever a class is learnable with iid data.
arXiv Detail & Related papers (2024-02-22T21:55:41Z)
Distributionally Robust Skeleton Learning of Discrete Bayesian Networks [9.46389554092506]
We consider the problem of learning the exact skeleton of general discrete Bayesian networks from potentially corrupted data. We propose to optimize the most adverse risk over a family of distributions within bounded Wasserstein distance or KL divergence to the empirical distribution. We present efficient algorithms and show the proposed methods are closely related to the standard regularized regression approach.
arXiv Detail & Related papers (2023-11-10T15:33:19Z)
Hypothesis Transfer Learning with Surrogate Classification Losses: Generalization Bounds through Algorithmic Stability [3.908842679355255]
Hypothesis transfer learning (HTL) contrasts domain adaptation by allowing for a previous task leverage, named the source, into a new one, the target. This paper studies the learning theory of HTL through algorithmic stability, an attractive theoretical framework for machine learning algorithms analysis.
arXiv Detail & Related papers (2023-05-31T09:38:21Z)
STEERING: Stein Information Directed Exploration for Model-Based Reinforcement Learning [111.75423966239092]
We propose an exploration incentive in terms of the integral probability metric (IPM) between a current estimate of the transition model and the unknown optimal. Based on KSD, we develop a novel algorithm algo: textbfSTEin information dirtextbfEcted exploration for model-based textbfReinforcement LearntextbfING.
arXiv Detail & Related papers (2023-01-28T00:49:28Z)
Learning to Bound Counterfactual Inference in Structural Causal Models from Observational and Randomised Data [64.96984404868411]
We derive a likelihood characterisation for the overall data that leads us to extend a previous EM-based algorithm. The new algorithm learns to approximate the (unidentifiability) region of model parameters from such mixed data sources. It delivers interval approximations to counterfactual results, which collapse to points in the identifiable case.
arXiv Detail & Related papers (2022-12-06T12:42:11Z)
Learning Algorithm Generalization Error Bounds via Auxiliary Distributions [16.44492672878356]
Generalization error bounds are essential for comprehending how well machine learning models work. In this work, we suggest a novel method, i.e., the Auxiliary Distribution Method, that leads to new upper bounds on expected generalization errors.
arXiv Detail & Related papers (2022-10-02T10:37:04Z)
On Leave-One-Out Conditional Mutual Information For Generalization [122.2734338600665]
We derive information theoretic generalization bounds for supervised learning algorithms based on a new measure of leave-one-out conditional mutual information (loo-CMI) Contrary to other CMI bounds, our loo-CMI bounds can be computed easily and can be interpreted in connection to other notions such as classical leave-one-out cross-validation. We empirically validate the quality of the bound by evaluating its predicted generalization gap in scenarios for deep learning.
arXiv Detail & Related papers (2022-07-01T17:58:29Z)
Transfer Learning under High-dimensional Generalized Linear Models [7.675822266933702]
We study the transfer learning problem under high-dimensional generalized linear models. We propose an oracle algorithm and derive its $ell$-estimation error bounds. When we don't know which sources to transfer, an algorithm-free transferable source detection approach is introduced.
arXiv Detail & Related papers (2021-05-29T15:39:43Z)
Graph Embedding with Data Uncertainty [113.39838145450007]
spectral-based subspace learning is a common data preprocessing step in many machine learning pipelines. Most subspace learning methods do not take into consideration possible measurement inaccuracies or artifacts that can lead to data with high uncertainty.
arXiv Detail & Related papers (2020-09-01T15:08:23Z)
Learning while Respecting Privacy and Robustness to Distributional Uncertainties and Adversarial Data [66.78671826743884]
The distributionally robust optimization framework is considered for training a parametric model. The objective is to endow the trained model with robustness against adversarially manipulated input data. Proposed algorithms offer robustness with little overhead.
arXiv Detail & Related papers (2020-07-07T18:25:25Z)
Information-theoretic analysis for transfer learning [5.081241420920605]
We give an information-theoretic analysis on the generalization error and the excess risk of transfer learning algorithms. Our results suggest, perhaps as expected, that the Kullback-Leibler divergence $D(mu||mu')$ plays an important role in characterizing the generalization error.
arXiv Detail & Related papers (2020-05-18T13:23:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.