Mitigating Hidden Confounding by Progressive Confounder Imputation via Large Language Models
- URL: http://arxiv.org/abs/2507.02928v1
- Date: Thu, 26 Jun 2025 03:49:13 GMT
- Title: Mitigating Hidden Confounding by Progressive Confounder Imputation via Large Language Models
- Authors: Hao Yang, Haoxuan Li, Luyu Chen, Haoxiang Wang, Xu Chen, Mingming Gong,
- Abstract summary: We make the first attempt to mitigate hidden confounding using large language models (LLMs)<n>We propose ProCI, a framework that elicits the semantic and world knowledge of LLMs to iteratively generate, impute, and validate hidden confounders.<n>Extensive experiments demonstrate that ProCI uncovers meaningful confounders and significantly improves treatment effect estimation.
- Score: 46.92706900119399
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Hidden confounding remains a central challenge in estimating treatment effects from observational data, as unobserved variables can lead to biased causal estimates. While recent work has explored the use of large language models (LLMs) for causal inference, most approaches still rely on the unconfoundedness assumption. In this paper, we make the first attempt to mitigate hidden confounding using LLMs. We propose ProCI (Progressive Confounder Imputation), a framework that elicits the semantic and world knowledge of LLMs to iteratively generate, impute, and validate hidden confounders. ProCI leverages two key capabilities of LLMs: their strong semantic reasoning ability, which enables the discovery of plausible confounders from both structured and unstructured inputs, and their embedded world knowledge, which supports counterfactual reasoning under latent confounding. To improve robustness, ProCI adopts a distributional reasoning strategy instead of direct value imputation to prevent the collapsed outputs. Extensive experiments demonstrate that ProCI uncovers meaningful confounders and significantly improves treatment effect estimation across various datasets and LLMs.
Related papers
- VIGOR+: Iterative Confounder Generation and Validation via LLM-CEVAE Feedback Loop [14.309475903975441]
Recent advances leverage Large Language Models to generate plausible hidden confounders based on domain knowledge.<n>We propose VIGOR+, a novel framework that closes the loop between LLM-based confounder generation and CEVAE-based statistical validation.<n>We formalize the feedback mechanism, prove convergence properties under mild assumptions, and provide a complete algorithmic framework.
arXiv Detail & Related papers (2025-12-22T12:48:29Z) - Towards Reliable LLM-based Robot Planning via Combined Uncertainty Estimation [68.106428321492]
Large language models (LLMs) demonstrate advanced reasoning abilities, enabling robots to understand natural language instructions and generate high-level plans with appropriate grounding.<n>LLMs hallucinations present a significant challenge, often leading to overconfident yet potentially misaligned or unsafe plans.<n>We present Combined Uncertainty estimation for Reliable Embodied planning (CURE), which decomposes the uncertainty into epistemic and intrinsic uncertainty, each estimated separately.
arXiv Detail & Related papers (2025-10-09T10:26:58Z) - LLM-based Agents for Automated Confounder Discovery and Subgroup Analysis in Causal Inference [1.1538255621565348]
We propose Large Language Model-based agents for automated confounder discovery and subgroup analysis.<n>Our framework systematically performs subgroup identification and confounding structure discovery.<n>Our findings suggest that LLM-based agents offer a promising path toward scalable, trustworthy, and semantically aware causal inference.
arXiv Detail & Related papers (2025-08-10T07:45:49Z) - Beyond Prompt-Induced Lies: Investigating LLM Deception on Benign Prompts [79.1081247754018]
Large Language Models (LLMs) are widely deployed in reasoning, planning, and decision-making tasks.<n>We propose a framework based on Contact Searching Questions(CSQ) to quantify the likelihood of deception.
arXiv Detail & Related papers (2025-08-08T14:46:35Z) - Hybrid Latent Reasoning via Reinforcement Learning [51.06635386903026]
We explore latent reasoning by leveraging the capabilities of large language models (LLMs) via reinforcement learning (RL)<n>We introduce hybrid reasoning policy optimization (HRPO), an RL-based hybrid latent reasoning approach that integrates prior hidden states into sampled tokens with a learnable gating mechanism.<n>HRPO-trained LLMs remain interpretable and exhibit intriguing behaviors like cross-lingual patterns and shorter completion lengths.
arXiv Detail & Related papers (2025-05-24T01:26:16Z) - Can Large Language Models Help Experimental Design for Causal Discovery? [94.66802142727883]
Large Language Model Guided Intervention Targeting (LeGIT) is a robust framework that effectively incorporates LLMs to augment existing numerical approaches for the intervention targeting in causal discovery.<n>LeGIT demonstrates significant improvements and robustness over existing methods and even surpasses humans.
arXiv Detail & Related papers (2025-03-03T03:43:05Z) - Investigating the Robustness of Deductive Reasoning with Large Language Models [7.494617747914778]
Large Language Models (LLMs) have been shown to achieve impressive results for many reasoning-based Natural Language Processing (NLP) tasks.<n>It remains unclear to which extent LLMs, in both informal and autoformalisation methods, are robust on logical deduction tasks.
arXiv Detail & Related papers (2025-02-04T17:16:51Z) - Aligning Large Language Models for Faithful Integrity Against Opposing Argument [71.33552795870544]
Large Language Models (LLMs) have demonstrated impressive capabilities in complex reasoning tasks.<n>They can be easily misled by unfaithful arguments during conversations, even when their original statements are correct.<n>We propose a novel framework, named Alignment for Faithful Integrity with Confidence Estimation.
arXiv Detail & Related papers (2025-01-02T16:38:21Z) - Metacognitive Myopia in Large Language Models [0.0]
Large Language Models (LLMs) exhibit potentially harmful biases that reinforce culturally inherent stereotypes, cloud moral judgments, or amplify positive evaluations of majority groups.
We propose metacognitive myopia as a cognitive-ecological framework that can account for a conglomerate of established and emerging LLM biases.
Our theoretical framework posits that a lack of the two components of metacognition, monitoring and control, causes five symptoms of metacognitive myopia in LLMs.
arXiv Detail & Related papers (2024-08-10T14:43:57Z) - Evaluating Uncertainty-based Failure Detection for Closed-Loop LLM Planners [10.746821861109176]
Large Language Models (LLMs) have witnessed remarkable performance as zero-shot task planners for robotic tasks.<n>However, the open-loop nature of previous works makes LLM-based planning error-prone and fragile.<n>In this work, we introduce a framework for closed-loop LLM-based planning called KnowLoop, backed by an uncertainty-based MLLMs failure detector.
arXiv Detail & Related papers (2024-06-01T12:52:06Z) - FactCHD: Benchmarking Fact-Conflicting Hallucination Detection [64.4610684475899]
FactCHD is a benchmark designed for the detection of fact-conflicting hallucinations from LLMs.
FactCHD features a diverse dataset that spans various factuality patterns, including vanilla, multi-hop, comparison, and set operation.
We introduce Truth-Triangulator that synthesizes reflective considerations by tool-enhanced ChatGPT and LoRA-tuning based on Llama2.
arXiv Detail & Related papers (2023-10-18T16:27:49Z) - PACE-LM: Prompting and Augmentation for Calibrated Confidence Estimation
with GPT-4 in Cloud Incident Root Cause Analysis [17.362895895214344]
Large language models (LLMs) are used to help humans identify the root causes of cloud incidents.
We propose to perform confidence estimation for the predictions to help on-call engineers make decisions on whether to adopt the model prediction.
We show that our method is able to produce calibrated confidence estimates for predicted root causes, validate the usefulness of retrieved historical data and the prompting strategy.
arXiv Detail & Related papers (2023-09-11T21:24:00Z) - Measuring and Improving Chain-of-Thought Reasoning in Vision-Language Models [61.28463542324576]
Vision-language models (VLMs) have recently demonstrated strong efficacy as visual assistants that can generate human-like outputs.
We evaluate existing state-of-the-art VLMs and find that even the best-performing model is unable to demonstrate strong visual reasoning capabilities and consistency.
We propose a two-stage training framework aimed at improving both the reasoning performance and consistency of VLMs.
arXiv Detail & Related papers (2023-09-08T17:49:44Z) - Improving Open Information Extraction with Large Language Models: A
Study on Demonstration Uncertainty [52.72790059506241]
Open Information Extraction (OIE) task aims at extracting structured facts from unstructured text.
Despite the potential of large language models (LLMs) like ChatGPT as a general task solver, they lag behind state-of-the-art (supervised) methods in OIE tasks.
arXiv Detail & Related papers (2023-09-07T01:35:24Z) - Integrating Large Language Model for Improved Causal Discovery [25.50313039584238]
Large Language Models (LLM) have been used for causal analysis across various domain-specific scenarios.<n>We propose an error-tolerant LLM-driven causal discovery framework.
arXiv Detail & Related papers (2023-06-29T12:48:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.