Related papers: Scalable Out-of-distribution Robustness in the Presence of Unobserved Confounders

Scalable Out-of-distribution Robustness in the Presence of Unobserved Confounders

URL: http://arxiv.org/abs/2411.19923v1
Date: Fri, 29 Nov 2024 18:38:17 GMT
Title: Scalable Out-of-distribution Robustness in the Presence of Unobserved Confounders
Authors: Parjanya Prashant, Seyedeh Baharan Khatami, Bruno Ribeiro, Babak Salimi,
Abstract summary: We consider the task of out-of-distribution (OOD) generalization, where the distribution shift is due to an unobserved confounder ($Z$)<n>Our work investigates a set of identifiability assumptions that tremendously simplify the predictor, whose resulting elegant simplicity outperforms existing approaches.
Score: 11.127206986936367
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: We consider the task of out-of-distribution (OOD) generalization, where the distribution shift is due to an unobserved confounder ($Z$) affecting both the covariates ($X$) and the labels ($Y$). In this setting, traditional assumptions of covariate and label shift are unsuitable due to the confounding, which introduces heterogeneity in the predictor, i.e., $\hat{Y} = f_Z(X)$. OOD generalization differs from traditional domain adaptation by not assuming access to the covariate distribution ($X^\text{te}$) of the test samples during training. These conditions create a challenging scenario for OOD robustness: (a) $Z^\text{tr}$ is an unobserved confounder during training, (b) $P^\text{te}{Z} \neq P^\text{tr}{Z}$, (c) $X^\text{te}$ is unavailable during training, and (d) the posterior predictive distribution depends on $P^\text{te}(Z)$, i.e., $\hat{Y} = E_{P^\text{te}(Z)}[f_Z(X)]$. In general, accurate predictions are unattainable in this scenario, and existing literature has proposed complex predictors based on identifiability assumptions that require multiple additional variables. Our work investigates a set of identifiability assumptions that tremendously simplify the predictor, whose resulting elegant simplicity outperforms existing approaches.

Related papers

Regression-Based Estimation of Causal Effects in the Presence of Selection Bias and Confounding [52.1068936424622]
We consider the problem of estimating the expected causal effect $E[Y|do(X)]$ for a target variable $Y$ when treatment $X$ is set by intervention. In settings without selection bias or confounding, $E[Y|do(X)] = E[Y|X]$, which can be estimated using standard regression methods. We propose a framework that incorporates both selection bias and confounding.
arXiv Detail & Related papers (2025-03-26T13:43:37Z)
Computational-Statistical Tradeoffs at the Next-Token Prediction Barrier: Autoregressive and Imitation Learning under Misspecification [50.717692060500696]
Next-token prediction with the logarithmic loss is a cornerstone of autoregressive sequence modeling. Next-token prediction can be made robust so as to achieve $C=tilde O(H)$, representing moderate error amplification. No computationally efficient algorithm can achieve sub-polynomial approximation factor $C=e(log H)1-Omega(1)$.
arXiv Detail & Related papers (2025-02-18T02:52:00Z)
Complexity Analysis of Normalizing Constant Estimation: from Jarzynski Equality to Annealed Importance Sampling and beyond [28.931489333515618]
Given an unnormalized probability density $piproptomathrme-V$, estimating its normalizing constant $Z=int_mathbbRdmathrme-V(x)mathrmdx$ or free energy $F=-log Z$ is a crucial problem in Bayesian statistics, statistical mechanics, and machine learning. We propose a new normalizing constant estimation algorithm based on reverse diffusion samplers and establish a framework for analyzing its complexity.
arXiv Detail & Related papers (2025-02-07T00:05:28Z)
Cross-validating causal discovery via Leave-One-Variable-Out [11.891940572224783]
We use the "Leave-One-Variable-Out (LOVO)" prediction where $Y$ is inferred from $X$ without any joint observations of $X$ and $Y$. We demonstrate that causal models on the two subsets, in the form of Acyclic Directed Mixed Graphs (ADMGs), often entail conclusions on the dependencies between $X$ and $Y$. The prediction error can then be estimated since the joint distribution $P(X, Y)$ is assumed to be available, and $X$ and $Y$ have only been omitted for the purpose of fal
arXiv Detail & Related papers (2024-11-08T15:15:34Z)
Mind the Gap: A Causal Perspective on Bias Amplification in Prediction & Decision-Making [58.06306331390586]
We introduce the notion of a margin complement, which measures how much a prediction score $S$ changes due to a thresholding operation. We show that under suitable causal assumptions, the influences of $X$ on the prediction score $S$ are equal to the influences of $X$ on the true outcome $Y$.
arXiv Detail & Related papers (2024-05-24T11:22:19Z)
Towards Better Understanding of In-Context Learning Ability from In-Context Uncertainty Quantification [7.869708570399577]
We consider a bi-objective prediction task of predicting both the conditional expectation $mathbbE[Y|X]$ and the conditional variance Var$(Y|X)$. Theoretically, we show that the trained Transformer reaches near Bayes-optimum, suggesting the usage of the information of the training distribution.
arXiv Detail & Related papers (2024-05-24T00:08:55Z)
TIC-TAC: A Framework for Improved Covariance Estimation in Deep Heteroscedastic Regression [109.69084997173196]
Deepscedastic regression involves jointly optimizing the mean and covariance of the predicted distribution using the negative log-likelihood. Recent works show that this may result in sub-optimal convergence due to the challenges associated with covariance estimation. We study two questions: (1) Does the predicted covariance truly capture the randomness of the predicted mean? Our results show that not only does TIC accurately learn the covariance, it additionally facilitates an improved convergence of the negative log-likelihood.
arXiv Detail & Related papers (2023-10-29T09:54:03Z)
Adaptive Data Analysis in a Balanced Adversarial Model [26.58630744414181]
In adaptive data analysis, a mechanism gets $n$ i.i.d. samples from an unknown distribution $D$, and is required to provide accurate estimations. We consider more restricted adversaries, called emphbalanced, where each such adversary consists of two separated algorithms. We show that these stronger hardness assumptions are unavoidable in the sense that any computationally bounded emphbalanced adversary implies the existence of public-key cryptography.
arXiv Detail & Related papers (2023-05-24T15:08:05Z)
Statistical Learning under Heterogeneous Distribution Shift [71.8393170225794]
Ground-truth predictor is additive $mathbbE[mathbfz mid mathbfx,mathbfy] = f_star(mathbfx) +g_star(mathbfy)$.
arXiv Detail & Related papers (2023-02-27T16:34:21Z)
Revealing Unobservables by Deep Learning: Generative Element Extraction Networks (GEEN) [5.3028918247347585]
This paper proposes a novel method for estimating realizations of a latent variable $X*$ in a random sample. To the best of our knowledge, this paper is the first to provide such identification in observation.
arXiv Detail & Related papers (2022-10-04T01:09:05Z)
CARD: Classification and Regression Diffusion Models [51.0421331214229]
We introduce classification and regression diffusion (CARD) models, which combine a conditional generative model and a pre-trained conditional mean estimator. We demonstrate the outstanding ability of CARD in conditional distribution prediction with both toy examples and real-world datasets.
arXiv Detail & Related papers (2022-06-15T03:30:38Z)
The Sample Complexity of Robust Covariance Testing [56.98280399449707]
We are given i.i.d. samples from a distribution of the form $Z = (1-epsilon) X + epsilon B$, where $X$ is a zero-mean and unknown covariance Gaussian $mathcalN(0, Sigma)$. In the absence of contamination, prior work gave a simple tester for this hypothesis testing task that uses $O(d)$ samples. We prove a sample complexity lower bound of $Omega(d2)$ for $epsilon$ an arbitrarily small constant and $gamma
arXiv Detail & Related papers (2020-12-31T18:24:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.