Counterfactual Maximum Likelihood Estimation for Training Deep Networks
- URL: http://arxiv.org/abs/2106.03831v1
- Date: Mon, 7 Jun 2021 17:47:16 GMT
- Title: Counterfactual Maximum Likelihood Estimation for Training Deep Networks
- Authors: Xinyi Wang, Wenhu Chen, Michael Saxon, William Yang Wang
- Abstract summary: Deep learning models are prone to learning spurious correlations that should not be learned as predictive clues.
We propose a causality-based training framework to reduce the spurious correlations caused by observable confounders.
We conduct experiments on two real-world tasks: Natural Language Inference (NLI) and Image Captioning.
- Score: 83.44219640437657
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Although deep learning models have driven state-of-the-art performance on a
wide array of tasks, they are prone to learning spurious correlations that
should not be learned as predictive clues. To mitigate this problem, we propose
a causality-based training framework to reduce the spurious correlations caused
by observable confounders. We give theoretical analysis on the underlying
general Structural Causal Model (SCM) and propose to perform Maximum Likelihood
Estimation (MLE) on the interventional distribution instead of the
observational distribution, namely Counterfactual Maximum Likelihood Estimation
(CMLE). As the interventional distribution, in general, is hidden from the
observational data, we then derive two different upper bounds of the expected
negative log-likelihood and propose two general algorithms, Implicit CMLE and
Explicit CMLE, for causal predictions of deep learning models using
observational data. We conduct experiments on two real-world tasks: Natural
Language Inference (NLI) and Image Captioning. The results show that CMLE
methods outperform the regular MLE method in terms of out-of-domain
generalization performance and reducing spurious correlations, while
maintaining comparable performance on the regular evaluations.
Related papers
- Estimating Causal Effects from Learned Causal Networks [56.14597641617531]
We propose an alternative paradigm for answering causal-effect queries over discrete observable variables.
We learn the causal Bayesian network and its confounding latent variables directly from the observational data.
We show that this emphmodel completion learning approach can be more effective than estimand approaches.
arXiv Detail & Related papers (2024-08-26T08:39:09Z) - Ranking and Combining Latent Structured Predictive Scores without Labeled Data [2.5064967708371553]
This paper introduces a novel structured unsupervised ensemble learning model (SUEL)
It exploits the dependency between a set of predictors with continuous predictive scores, rank the predictors without labeled data and combine them to an ensembled score with weights.
The efficacy of the proposed methods is rigorously assessed through both simulation studies and real-world application of risk genes discovery.
arXiv Detail & Related papers (2024-08-14T20:14:42Z) - Revisiting Spurious Correlation in Domain Generalization [12.745076668687748]
We build a structural causal model (SCM) to describe the causality within data generation process.
We further conduct a thorough analysis of the mechanisms underlying spurious correlation.
In this regard, we propose to control confounding bias in OOD generalization by introducing a propensity score weighted estimator.
arXiv Detail & Related papers (2024-06-17T13:22:00Z) - MANO: Exploiting Matrix Norm for Unsupervised Accuracy Estimation Under Distribution Shifts [25.643876327918544]
Current logit-based methods are vulnerable to overconfidence issues, leading to prediction bias, especially under the natural shift.
We propose MaNo, which applies a data-dependent normalization on the logits to reduce prediction bias, and takes the $L_p$ norm of the matrix of normalized logits as the estimation score.
MaNo achieves state-of-the-art performance across various architectures in the presence of synthetic, natural, or subpopulation shifts.
arXiv Detail & Related papers (2024-05-29T10:45:06Z) - A PAC-Bayesian Perspective on the Interpolating Information Criterion [54.548058449535155]
We show how a PAC-Bayes bound is obtained for a general class of models, characterizing factors which influence performance in the interpolating regime.
We quantify how the test error for overparameterized models achieving effectively zero training error depends on the quality of the implicit regularization imposed by e.g. the combination of model, parameter-initialization scheme.
arXiv Detail & Related papers (2023-11-13T01:48:08Z) - B-Learner: Quasi-Oracle Bounds on Heterogeneous Causal Effects Under
Hidden Confounding [51.74479522965712]
We propose a meta-learner called the B-Learner, which can efficiently learn sharp bounds on the CATE function under limits on hidden confounding.
We prove its estimates are valid, sharp, efficient, and have a quasi-oracle property with respect to the constituent estimators under more general conditions than existing methods.
arXiv Detail & Related papers (2023-04-20T18:07:19Z) - GEC: A Unified Framework for Interactive Decision Making in MDP, POMDP,
and Beyond [101.5329678997916]
We study sample efficient reinforcement learning (RL) under the general framework of interactive decision making.
We propose a novel complexity measure, generalized eluder coefficient (GEC), which characterizes the fundamental tradeoff between exploration and exploitation.
We show that RL problems with low GEC form a remarkably rich class, which subsumes low Bellman eluder dimension problems, bilinear class, low witness rank problems, PO-bilinear class, and generalized regular PSR.
arXiv Detail & Related papers (2022-11-03T16:42:40Z) - Distributionally Robust Causal Inference with Observational Data [4.8986598953553555]
We consider the estimation of average treatment effects in observational studies without the standard assumption of unconfoundedness.
We propose a new framework of robust causal inference under the general observational study setting with the possible existence of unobserved confounders.
arXiv Detail & Related papers (2022-10-15T16:02:33Z) - Optimal regularizations for data generation with probabilistic graphical
models [0.0]
Empirically, well-chosen regularization schemes dramatically improve the quality of the inferred models.
We consider the particular case of L 2 and L 1 regularizations in the Maximum A Posteriori (MAP) inference of generative pairwise graphical models.
arXiv Detail & Related papers (2021-12-02T14:45:16Z) - Learning Causal Semantic Representation for Out-of-Distribution
Prediction [125.38836464226092]
We propose a Causal Semantic Generative model (CSG) based on a causal reasoning so that the two factors are modeled separately.
We show that CSG can identify the semantic factor by fitting training data, and this semantic-identification guarantees the boundedness of OOD generalization error.
arXiv Detail & Related papers (2020-11-03T13:16:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.