Learning to Transfer with von Neumann Conditional Divergence
- URL: http://arxiv.org/abs/2108.03531v1
- Date: Sat, 7 Aug 2021 22:18:23 GMT
- Title: Learning to Transfer with von Neumann Conditional Divergence
- Authors: Ammar Shaker and Shujian Yu
- Abstract summary: We introduce the recently proposed von Neumann conditional divergence to improve the transferability across multiple domains.
We design novel learning objectives assuming those source tasks are observed either simultaneously or sequentially.
In both scenarios, we obtain favorable performance against state-of-the-art methods in terms of smaller generalization error on new tasks and less catastrophic forgetting on source tasks (in the sequential setup)
- Score: 14.926485055255942
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The similarity of feature representations plays a pivotal role in the success
of domain adaptation and generalization. Feature similarity includes both the
invariance of marginal distributions and the closeness of conditional
distributions given the desired response $y$ (e.g., class labels).
Unfortunately, traditional methods always learn such features without fully
taking into consideration the information in $y$, which in turn may lead to a
mismatch of the conditional distributions or the mix-up of discriminative
structures underlying data distributions. In this work, we introduce the
recently proposed von Neumann conditional divergence to improve the
transferability across multiple domains. We show that this new divergence is
differentiable and eligible to easily quantify the functional dependence
between features and $y$. Given multiple source tasks, we integrate this
divergence to capture discriminative information in $y$ and design novel
learning objectives assuming those source tasks are observed either
simultaneously or sequentially. In both scenarios, we obtain favorable
performance against state-of-the-art methods in terms of smaller generalization
error on new tasks and less catastrophic forgetting on source tasks (in the
sequential setup).
Related papers
- Proxy Methods for Domain Adaptation [78.03254010884783]
proxy variables allow for adaptation to distribution shift without explicitly recovering or modeling latent variables.
We develop a two-stage kernel estimation approach to adapt to complex distribution shifts in both settings.
arXiv Detail & Related papers (2024-03-12T09:32:41Z) - Transductive conformal inference with adaptive scores [3.591224588041813]
We consider the transductive setting, where decisions are made on a test sample of $m$ new points.
We show that their joint distribution follows a P'olya urn model, and establish a concentration inequality for their empirical distribution function.
We demonstrate the usefulness of these theoretical results through uniform, in-probability guarantees for two machine learning tasks.
arXiv Detail & Related papers (2023-10-27T12:48:30Z) - Multi-task Bias-Variance Trade-off Through Functional Constraints [102.64082402388192]
Multi-task learning aims to acquire a set of functions that perform well for diverse tasks.
In this paper we draw intuition from the two extreme learning scenarios -- a single function for all tasks, and a task-specific function that ignores the other tasks.
We introduce a constrained learning formulation that enforces domain specific solutions to a central function.
arXiv Detail & Related papers (2022-10-27T16:06:47Z) - Function-space regularized R\'enyi divergences [6.221019624345409]
We propose a new family of regularized R'enyi divergences parametrized by a variational function space.
We prove several properties of these new divergences, showing that they interpolate between the classical R'enyi divergences and IPMs.
We show that the proposed regularized R'enyi divergences inherit features from IPMs such as the ability to compare distributions that are not absolutely continuous.
arXiv Detail & Related papers (2022-10-10T19:18:04Z) - Information Processing Equalities and the Information-Risk Bridge [10.451984251615512]
We introduce two new classes of measures of information for statistical experiments.
We derive a simple geometrical relationship between measures of information and the Bayes risk of a statistical decision problem.
arXiv Detail & Related papers (2022-07-25T08:54:36Z) - Predicting with Confidence on Unseen Distributions [90.68414180153897]
We connect domain adaptation and predictive uncertainty literature to predict model accuracy on challenging unseen distributions.
We find that the difference of confidences (DoC) of a classifier's predictions successfully estimates the classifier's performance change over a variety of shifts.
We specifically investigate the distinction between synthetic and natural distribution shifts and observe that despite its simplicity DoC consistently outperforms other quantifications of distributional difference.
arXiv Detail & Related papers (2021-07-07T15:50:18Z) - $(f,\Gamma)$-Divergences: Interpolating between $f$-Divergences and
Integral Probability Metrics [6.221019624345409]
We develop a framework for constructing information-theoretic divergences that subsume both $f$-divergences and integral probability metrics (IPMs)
We show that they can be expressed as a two-stage mass-redistribution/mass-transport process.
Using statistical learning as an example, we demonstrate their advantage in training generative adversarial networks (GANs) for heavy-tailed, not-absolutely continuous sample distributions.
arXiv Detail & Related papers (2020-11-11T18:17:09Z) - Removing Bias in Multi-modal Classifiers: Regularization by Maximizing
Functional Entropies [88.0813215220342]
Some modalities can more easily contribute to the classification results than others.
We develop a method based on the log-Sobolev inequality, which bounds the functional entropy with the functional-Fisher-information.
On the two challenging multi-modal datasets VQA-CPv2 and SocialIQ, we obtain state-of-the-art results while more uniformly exploiting the modalities.
arXiv Detail & Related papers (2020-10-21T07:40:33Z) - Learning Invariant Representations and Risks for Semi-supervised Domain
Adaptation [109.73983088432364]
We propose the first method that aims to simultaneously learn invariant representations and risks under the setting of semi-supervised domain adaptation (Semi-DA)
We introduce the LIRR algorithm for jointly textbfLearning textbfInvariant textbfRepresentations and textbfRisks.
arXiv Detail & Related papers (2020-10-09T15:42:35Z) - Few-shot Domain Adaptation by Causal Mechanism Transfer [107.08605582020866]
We study few-shot supervised domain adaptation (DA) for regression problems, where only a few labeled target domain data and many labeled source domain data are available.
Many of the current DA methods base their transfer assumptions on either parametrized distribution shift or apparent distribution similarities.
We propose mechanism transfer, a meta-distributional scenario in which a data generating mechanism is invariant among domains.
arXiv Detail & Related papers (2020-02-10T02:16:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.