On Causality in Domain Adaptation and Semi-Supervised Learning: an Information-Theoretic Analysis for Parametric Models
- URL: http://arxiv.org/abs/2205.04641v2
- Date: Mon, 16 Sep 2024 13:48:17 GMT
- Title: On Causality in Domain Adaptation and Semi-Supervised Learning: an Information-Theoretic Analysis for Parametric Models
- Authors: Xuetong Wu, Mingming Gong, Jonathan H. Manton, Uwe Aickelin, Jingge Zhu,
- Abstract summary: We study the learning performance of prediction in the target domain from an information-theoretic perspective.
We show that in causal learning, the excess risk depends on the size of the source sample at a rate of $O(frac1m)$ only if the labelling distribution between the source and target domains remains unchanged.
In anti-causal learning, we show that the unlabelled data dominate the performance at a rate of typically $O(frac1n)$.
- Score: 40.97750409326622
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advancements in unsupervised domain adaptation (UDA) and semi-supervised learning (SSL), particularly incorporating causality, have led to significant methodological improvements in these learning problems. However, a formal theory that explains the role of causality in the generalization performance of UDA/SSL is still lacking. In this paper, we consider the UDA/SSL scenarios where we access $m$ labelled source data and $n$ unlabelled target data as training instances under different causal settings with a parametric probabilistic model. We study the learning performance (e.g., excess risk) of prediction in the target domain from an information-theoretic perspective. Specifically, we distinguish two scenarios: the learning problem is called causal learning if the feature is the cause and the label is the effect, and is called anti-causal learning otherwise. We show that in causal learning, the excess risk depends on the size of the source sample at a rate of $O(\frac{1}{m})$ only if the labelling distribution between the source and target domains remains unchanged. In anti-causal learning, we show that the unlabelled data dominate the performance at a rate of typically $O(\frac{1}{n})$. These results bring out the relationship between the data sample size and the hardness of the learning problem with different causal mechanisms.
Related papers
- Mechanism learning: Reverse causal inference in the presence of multiple unknown confounding through front-door causal bootstrapping [0.8901073744693314]
A major limitation of machine learning (ML) prediction models is that they recover associational, rather than causal, predictive relationships between variables.
This paper proposes mechanism learning, a simple method which uses front-door causal bootstrapping to deconfound observational data.
We test our method on fully synthetic, semi-synthetic and real-world datasets, demonstrating that it can discover reliable, unbiased, causal ML predictors.
arXiv Detail & Related papers (2024-10-26T03:34:55Z) - On Characterizing and Mitigating Imbalances in Multi-Instance Partial Label Learning [57.18649648182171]
We make contributions towards addressing a problem that hasn't been studied so far in the context of MI-PLL.
We derive class-specific risk bounds for MI-PLL, while making minimal assumptions.
Our theory reveals a unique phenomenon: that $sigma$ can greatly impact learning imbalances.
arXiv Detail & Related papers (2024-07-13T20:56:34Z) - Nonparametric Identifiability of Causal Representations from Unknown
Interventions [63.1354734978244]
We study causal representation learning, the task of inferring latent causal variables and their causal relations from mixtures of the variables.
Our goal is to identify both the ground truth latents and their causal graph up to a set of ambiguities which we show to be irresolvable from interventional data.
arXiv Detail & Related papers (2023-06-01T10:51:58Z) - The Power and Limitation of Pretraining-Finetuning for Linear Regression
under Covariate Shift [127.21287240963859]
We investigate a transfer learning approach with pretraining on the source data and finetuning based on the target data.
For a large class of linear regression instances, transfer learning with $O(N2)$ source data is as effective as supervised learning with $N$ target data.
arXiv Detail & Related papers (2022-08-03T05:59:49Z) - On the Generalization for Transfer Learning: An Information-Theoretic Analysis [8.102199960821165]
We give an information-theoretic analysis of the generalization error and excess risk of transfer learning algorithms.
Our results suggest, perhaps as expected, that the Kullback-Leibler divergenceD(mu|mu')$ plays an important role in the characterizations.
We then generalize the mutual information bound with other divergences such as $phi$-divergence and Wasserstein distance.
arXiv Detail & Related papers (2022-07-12T08:20:41Z) - Equivariance Allows Handling Multiple Nuisance Variables When Analyzing
Pooled Neuroimaging Datasets [53.34152466646884]
In this paper, we show how bringing recent results on equivariant representation learning instantiated on structured spaces together with simple use of classical results on causal inference provides an effective practical solution.
We demonstrate how our model allows dealing with more than one nuisance variable under some assumptions and can enable analysis of pooled scientific datasets in scenarios that would otherwise entail removing a large portion of the samples.
arXiv Detail & Related papers (2022-03-29T04:54:06Z) - Bounding Information Leakage in Machine Learning [26.64770573405079]
This paper investigates fundamental bounds on information leakage.
We identify and bound the success rate of the worst-case membership inference attack.
We derive bounds on the mutual information between the sensitive attributes and model parameters.
arXiv Detail & Related papers (2021-05-09T08:49:14Z) - Do We Really Need to Access the Source Data? Source Hypothesis Transfer
for Unsupervised Domain Adaptation [102.67010690592011]
Unsupervised adaptationUDA (UDA) aims to leverage the knowledge learned from a labeled source dataset to solve similar tasks in a new unlabeled domain.
Prior UDA methods typically require to access the source data when learning to adapt the model.
This work tackles a practical setting where only a trained source model is available and how we can effectively utilize such a model without source data to solve UDA problems.
arXiv Detail & Related papers (2020-02-20T03:13:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.