Related papers: Proximal Causal Inference With Text Data

Proximal Causal Inference With Text Data

URL: http://arxiv.org/abs/2401.06687v2
Date: Tue, 21 May 2024 21:08:54 GMT
Title: Proximal Causal Inference With Text Data
Authors: Jacob M. Chen, Rohit Bhattacharya, Katherine A. Keith,
Abstract summary: Recent text-based causal methods attempt to mitigate confounding bias by estimating proxies of confounding variables that are partially or imperfectly measured from unstructured text data. We propose a new causal inference method that uses multiple instances of pre-treatment text data, infers two proxies from two zero-shot models on the separate instances, and applies these proxies in the proximal g-formula. We evaluate our method in synthetic and semi-synthetic settings and find that it produces estimates with low bias.
Score: 5.796482272333648
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent text-based causal methods attempt to mitigate confounding bias by estimating proxies of confounding variables that are partially or imperfectly measured from unstructured text data. These approaches, however, assume analysts have supervised labels of the confounders given text for a subset of instances, a constraint that is sometimes infeasible due to data privacy or annotation costs. In this work, we address settings in which an important confounding variable is completely unobserved. We propose a new causal inference method that uses multiple instances of pre-treatment text data, infers two proxies from two zero-shot models on the separate instances, and applies these proxies in the proximal g-formula. We prove that our text-based proxy method satisfies identification conditions required by the proximal g-formula while other seemingly reasonable proposals do not. We evaluate our method in synthetic and semi-synthetic settings and find that it produces estimates with low bias. To address untestable assumptions associated with the proximal g-formula, we further propose an odds ratio falsification heuristic. This new combination of proximal causal inference and zero-shot classifiers expands the set of text-specific causal methods available to practitioners.

Related papers

Density Ratio-based Proxy Causal Learning Without Density Ratios [26.49087216375106]
We address the setting of Proxy Causal Learning (PCL), which has the goal of estimating causal effects from observed data in the presence of hidden confounding. Two approaches have been proposed to perform causal effect estimation given proxy variables. We propose a practical and effective implementation of the second approach, which bypasses explicit density ratio estimation and is suitable for continuous and high-dimensional treatments.
arXiv Detail & Related papers (2025-03-11T12:27:54Z)
Automating the Selection of Proxy Variables of Unmeasured Confounders [16.773841751009748]
We extend the existing proxy variable estimator to accommodate scenarios where multiple unmeasured confounders exist between the treatments and the outcome. We propose two data-driven methods for the selection of proxy variables and for the unbiased estimation of causal effects.
arXiv Detail & Related papers (2024-05-25T08:53:49Z)
Onboard Out-of-Calibration Detection of Deep Learning Models using Conformal Prediction [4.856998175951948]
We show that conformal prediction algorithms are related to the uncertainty of the deep learning model and that this relation can be used to detect if the deep learning model is out-of-calibration. An out-of-calibration detection procedure relating the model uncertainty and the average size of the conformal prediction set is presented.
arXiv Detail & Related papers (2024-05-04T11:05:52Z)
Prototype-based Aleatoric Uncertainty Quantification for Cross-modal Retrieval [139.21955930418815]
Cross-modal Retrieval methods build similarity relations between vision and language modalities by jointly learning a common representation space. However, the predictions are often unreliable due to the Aleatoric uncertainty, which is induced by low-quality data, e.g., corrupt images, fast-paced videos, and non-detailed texts. We propose a novel Prototype-based Aleatoric Uncertainty Quantification (PAU) framework to provide trustworthy predictions by quantifying the uncertainty arisen from the inherent data ambiguity.
arXiv Detail & Related papers (2023-09-29T09:41:19Z)
Interpretable Automatic Fine-grained Inconsistency Detection in Text Summarization [56.94741578760294]
We propose the task of fine-grained inconsistency detection, the goal of which is to predict the fine-grained types of factual errors in a summary. Motivated by how humans inspect factual inconsistency in summaries, we propose an interpretable fine-grained inconsistency detection model, FineGrainFact.
arXiv Detail & Related papers (2023-05-23T22:11:47Z)
Simulation-based, Finite-sample Inference for Privatized Data [14.218697973204065]
We propose a simulation-based "repro sample" approach to produce statistically valid confidence intervals and hypothesis tests. We show that this methodology is applicable to a wide variety of private inference problems.
arXiv Detail & Related papers (2023-03-09T15:19:31Z)
Towards Fine-Grained Information: Identifying the Type and Location of Translation Errors [80.22825549235556]
Existing approaches can not synchronously consider error position and type. We build an FG-TED model to predict the textbf addition and textbfomission errors. Experiments show that our model can identify both error type and position concurrently, and gives state-of-the-art results.
arXiv Detail & Related papers (2023-02-17T16:20:33Z)
ADDMU: Detection of Far-Boundary Adversarial Examples with Data and Model Uncertainty Estimation [125.52743832477404]
Adversarial Examples Detection (AED) is a crucial defense technique against adversarial attacks. We propose a new technique, textbfADDMU, which combines two types of uncertainty estimation for both regular and FB adversarial example detection. Our new method outperforms previous methods by 3.6 and 6.0 emphAUC points under each scenario.
arXiv Detail & Related papers (2022-10-22T09:11:12Z)
A Unified Evaluation of Textual Backdoor Learning: Frameworks and Benchmarks [72.7373468905418]
We develop an open-source toolkit OpenBackdoor to foster the implementations and evaluations of textual backdoor learning. We also propose CUBE, a simple yet strong clustering-based defense baseline.
arXiv Detail & Related papers (2022-06-17T02:29:23Z)
Approximate Conditional Coverage via Neural Model Approximations [0.030458514384586396]
We analyze a data-driven procedure for obtaining empirically reliable approximate conditional coverage. We demonstrate the potential for substantial (and otherwise unknowable) under-coverage with split-conformal alternatives with marginal coverage guarantees.
arXiv Detail & Related papers (2022-05-28T02:59:05Z)
Double Perturbation: On the Robustness of Robustness and Counterfactual Bias Evaluation [109.06060143938052]
We propose a "double perturbation" framework to uncover model weaknesses beyond the test dataset. We apply this framework to study two perturbation-based approaches that are used to analyze models' robustness and counterfactual bias in English.
arXiv Detail & Related papers (2021-04-12T06:57:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.