Characterizing and Understanding the Generalization Error of Transfer
Learning with Gibbs Algorithm
- URL: http://arxiv.org/abs/2111.01635v1
- Date: Tue, 2 Nov 2021 14:49:48 GMT
- Title: Characterizing and Understanding the Generalization Error of Transfer
Learning with Gibbs Algorithm
- Authors: Yuheng Bu, Gholamali Aminian, Laura Toni, Miguel Rodrigues and Gregory
Wornell
- Abstract summary: We provide an information-theoretic analysis of the generalization ability of Gibbs-based transfer learning algorithms.
We focus on two popular transfer learning approaches, $alpha$-weightedERM and two-stage-ERM.
- Score: 10.851348154870854
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We provide an information-theoretic analysis of the generalization ability of
Gibbs-based transfer learning algorithms by focusing on two popular transfer
learning approaches, $\alpha$-weighted-ERM and two-stage-ERM. Our key result is
an exact characterization of the generalization behaviour using the conditional
symmetrized KL information between the output hypothesis and the target
training samples given the source samples. Our results can also be applied to
provide novel distribution-free generalization error upper bounds on these two
aforementioned Gibbs algorithms. Our approach is versatile, as it also
characterizes the generalization errors and excess risks of these two Gibbs
algorithms in the asymptotic regime, where they converge to the
$\alpha$-weighted-ERM and two-stage-ERM, respectively. Based on our theoretical
results, we show that the benefits of transfer learning can be viewed as a
bias-variance trade-off, with the bias induced by the source distribution and
the variance induced by the lack of target samples. We believe this viewpoint
can guide the choice of transfer learning algorithms in practice.
Related papers
- Understanding Transfer Learning via Mean-field Analysis [5.7150083558242075]
We consider two main transfer learning scenarios, $alpha$-ERM and fine-tuning with the KL-regularized empirical risk minimization.
We show the benefits of transfer learning with a one-hidden-layer neural network in the mean-field regime.
arXiv Detail & Related papers (2024-10-22T16:00:44Z) - Theoretical Insights for Diffusion Guidance: A Case Study for Gaussian
Mixture Models [59.331993845831946]
Diffusion models benefit from instillation of task-specific information into the score function to steer the sample generation towards desired properties.
This paper provides the first theoretical study towards understanding the influence of guidance on diffusion models in the context of Gaussian mixture models.
arXiv Detail & Related papers (2024-03-03T23:15:48Z) - GIT: Detecting Uncertainty, Out-Of-Distribution and Adversarial Samples
using Gradients and Invariance Transformations [77.34726150561087]
We propose a holistic approach for the detection of generalization errors in deep neural networks.
GIT combines the usage of gradient information and invariance transformations.
Our experiments demonstrate the superior performance of GIT compared to the state-of-the-art on a variety of network architectures.
arXiv Detail & Related papers (2023-07-05T22:04:38Z) - GEC: A Unified Framework for Interactive Decision Making in MDP, POMDP,
and Beyond [101.5329678997916]
We study sample efficient reinforcement learning (RL) under the general framework of interactive decision making.
We propose a novel complexity measure, generalized eluder coefficient (GEC), which characterizes the fundamental tradeoff between exploration and exploitation.
We show that RL problems with low GEC form a remarkably rich class, which subsumes low Bellman eluder dimension problems, bilinear class, low witness rank problems, PO-bilinear class, and generalized regular PSR.
arXiv Detail & Related papers (2022-11-03T16:42:40Z) - On the Generalization for Transfer Learning: An Information-Theoretic Analysis [8.102199960821165]
We give an information-theoretic analysis of the generalization error and excess risk of transfer learning algorithms.
Our results suggest, perhaps as expected, that the Kullback-Leibler divergenceD(mu|mu')$ plays an important role in the characterizations.
We then generalize the mutual information bound with other divergences such as $phi$-divergence and Wasserstein distance.
arXiv Detail & Related papers (2022-07-12T08:20:41Z) - Characterizing the Generalization Error of Gibbs Algorithm with
Symmetrized KL information [18.92529916180208]
Bounding the generalization error of a supervised learning algorithm is one of the most important problems in learning theory.
Our main contribution is an exact characterization of the expected generalization error of the well-known Gibbs algorithm.
arXiv Detail & Related papers (2021-07-28T22:20:34Z) - Counterfactual Maximum Likelihood Estimation for Training Deep Networks [83.44219640437657]
Deep learning models are prone to learning spurious correlations that should not be learned as predictive clues.
We propose a causality-based training framework to reduce the spurious correlations caused by observable confounders.
We conduct experiments on two real-world tasks: Natural Language Inference (NLI) and Image Captioning.
arXiv Detail & Related papers (2021-06-07T17:47:16Z) - Learning Gaussian Mixtures with Generalised Linear Models: Precise
Asymptotics in High-dimensions [79.35722941720734]
Generalised linear models for multi-class classification problems are one of the fundamental building blocks of modern machine learning tasks.
We prove exacts characterising the estimator in high-dimensions via empirical risk minimisation.
We discuss how our theory can be applied beyond the scope of synthetic data.
arXiv Detail & Related papers (2021-06-07T16:53:56Z) - Towards Accurate Knowledge Transfer via Target-awareness Representation
Disentanglement [56.40587594647692]
We propose a novel transfer learning algorithm, introducing the idea of Target-awareness REpresentation Disentanglement (TRED)
TRED disentangles the relevant knowledge with respect to the target task from the original source model and used as a regularizer during fine-tuning the target model.
Experiments on various real world datasets show that our method stably improves the standard fine-tuning by more than 2% in average.
arXiv Detail & Related papers (2020-10-16T17:45:08Z) - Information-theoretic analysis for transfer learning [5.081241420920605]
We give an information-theoretic analysis on the generalization error and the excess risk of transfer learning algorithms.
Our results suggest, perhaps as expected, that the Kullback-Leibler divergence $D(mu||mu')$ plays an important role in characterizing the generalization error.
arXiv Detail & Related papers (2020-05-18T13:23:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.