Related papers: LLM-based Agents for Automated Confounder Discovery and Subgroup Analysis in Causal Inference

LLM-based Agents for Automated Confounder Discovery and Subgroup Analysis in Causal Inference

URL: http://arxiv.org/abs/2508.07221v1
Date: Sun, 10 Aug 2025 07:45:49 GMT
Title: LLM-based Agents for Automated Confounder Discovery and Subgroup Analysis in Causal Inference
Authors: Po-Han Lee, Yu-Cheng Lin, Chan-Tung Ku, Chan Hsu, Pei-Cing Huang, Ping-Hsun Wu, Yihuang Kang,
Abstract summary: We propose Large Language Model-based agents for automated confounder discovery and subgroup analysis.<n>Our framework systematically performs subgroup identification and confounding structure discovery.<n>Our findings suggest that LLM-based agents offer a promising path toward scalable, trustworthy, and semantically aware causal inference.
Score: 1.1538255621565348
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Estimating individualized treatment effects from observational data presents a persistent challenge due to unmeasured confounding and structural bias. Causal Machine Learning (causal ML) methods, such as causal trees and doubly robust estimators, provide tools for estimating conditional average treatment effects. These methods have limited effectiveness in complex real-world environments due to the presence of latent confounders or those described in unstructured formats. Moreover, reliance on domain experts for confounder identification and rule interpretation introduces high annotation cost and scalability concerns. In this work, we proposed Large Language Model-based agents for automated confounder discovery and subgroup analysis that integrate agents into the causal ML pipeline to simulate domain expertise. Our framework systematically performs subgroup identification and confounding structure discovery by leveraging the reasoning capabilities of LLM-based agents, which reduces human dependency while preserving interpretability. Experiments on real-world medical datasets show that our proposed approach enhances treatment effect estimation robustness by narrowing confidence intervals and uncovering unrecognized confounding biases. Our findings suggest that LLM-based agents offer a promising path toward scalable, trustworthy, and semantically aware causal inference.

Related papers

Hallucination Detection and Mitigation in Large Language Models [0.0]
Large Language Models (LLMs) and Large Reasoning Models (LRMs) offer transformative potential for high-stakes domains like finance and law.<n>Their tendency to hallucinate, generating factually incorrect or unsupported content, poses a critical reliability risk.<n>This paper introduces a comprehensive framework for hallucination management, built on a continuous improvement cycle driven by root cause awareness.
arXiv Detail & Related papers (2026-01-14T23:19:37Z)
SelfAI: Building a Self-Training AI System with LLM Agents [79.10991818561907]
SelfAI is a general multi-agent platform that combines a User Agent for translating high-level research objectives into standardized experimental configurations.<n>An Experiment Manager orchestrates parallel, fault-tolerant training across heterogeneous hardware while maintaining a structured knowledge base for continuous feedback.<n>Across regression, computer vision, scientific computing, medical imaging, and drug discovery benchmarks, SelfAI consistently achieves strong performance and reduces redundant trials.
arXiv Detail & Related papers (2025-11-29T09:18:39Z)
Technical Report: Facilitating the Adoption of Causal Inference Methods Through LLM-Empowered Co-Pilot [44.336297829718795]
We introduce CATE-B, an open-source co-pilot system that uses large language models (LLMs) within an agentic framework to guide users through treatment effect estimation.<n>CATE-B assists in (i) constructing a structural causal model via causal discovery and LLM-based edge orientation, (ii) identifying robust adjustment sets through a novel Minimal Uncertainty Adjustment Set criterion, and (iii) selecting appropriate regression methods tailored to the causal structure and dataset characteristics.
arXiv Detail & Related papers (2025-08-14T12:20:51Z)
CANDLE: A Cross-Modal Agentic Knowledge Distillation Framework for Interpretable Sarcopenia Diagnosis [3.0245458192729466]
CANDLE mitigates the interpretability-performance trade-off, enhances predictive accuracy, and preserves high decision consistency.<n>The framework offers a scalable approach to knowledge assetization of TML models, enabling interpretable, reproducible, and clinically aligned decision support in sarcopenia and potentially broader medical domains.
arXiv Detail & Related papers (2025-07-26T15:50:08Z)
Agentic Reinforced Policy Optimization [66.96989268893932]
Large-scale reinforcement learning with verifiable rewards (RLVR) has demonstrated its effectiveness in harnessing the potential of large language models (LLMs) for single-turn reasoning tasks.<n>Current RL algorithms inadequately balance the models' intrinsic long-horizon reasoning capabilities and their proficiency in multi-turn tool interactions.<n>We propose Agentic Reinforced Policy Optimization (ARPO), a novel agentic RL algorithm tailored for training multi-turn LLM-based agents.
arXiv Detail & Related papers (2025-07-26T07:53:11Z)
Mitigating Hidden Confounding by Progressive Confounder Imputation via Large Language Models [46.92706900119399]
We make the first attempt to mitigate hidden confounding using large language models (LLMs)<n>We propose ProCI, a framework that elicits the semantic and world knowledge of LLMs to iteratively generate, impute, and validate hidden confounders.<n>Extensive experiments demonstrate that ProCI uncovers meaningful confounders and significantly improves treatment effect estimation.
arXiv Detail & Related papers (2025-06-26T03:49:13Z)
Understanding the Information Propagation Effects of Communication Topologies in LLM-based Multi-Agent Systems [58.95962217043371]
We present a causal framework to analyze how agent outputs, whether correct or erroneous, propagate under topologies with varying sparsity.<n>Our empirical studies reveal that moderately sparse topologies, which effectively suppress error propagation while preserving beneficial information diffusion, typically achieve optimal task performance.<n>We propose a novel topology design approach, EIB-leanrner, that balances error suppression and beneficial information propagation by fusing connectivity patterns from both dense and sparse graphs.
arXiv Detail & Related papers (2025-05-29T11:21:48Z)
Can LLMs Assist Expert Elicitation for Probabilistic Causal Modeling? [0.0]
This study investigates the potential of Large Language Models (LLMs) as an alternative to human expert elicitation for extracting structured causal knowledge.<n>LLMs generated causal structures, specifically Bayesian networks (BNs), were benchmarked against traditional statistical methods.<n>LLMs generated BNs demonstrated lower entropy than expert-elicited and statistically generated BNs, suggesting higher confidence and precision in predictions.
arXiv Detail & Related papers (2025-04-14T16:45:52Z)
Hallucination Detection in LLMs with Topological Divergence on Attention Graphs [64.74977204942199]
Hallucination, i.e., generating factually incorrect content, remains a critical challenge for large language models.<n>We introduce TOHA, a TOpology-based HAllucination detector in the RAG setting.
arXiv Detail & Related papers (2025-04-14T10:06:27Z)
Inferring Outcome Means of Exponential Family Distributions Estimated by Deep Neural Networks [5.909780773881451]
inference on deep neural networks (DNNs) for categorical or exponential family outcomes remains underexplored.<n>We propose a DNN estimator under generalized nonparametric regression models (GNRMs) and developing a rigorous inference framework.<n>We further apply the method to the electronic Intensive Care Unit (eICU) dataset to predict ICU risk and offer patient-centric insights for clinical decision-making.
arXiv Detail & Related papers (2025-04-12T21:32:42Z)
Can Large Language Models Help Experimental Design for Causal Discovery? [94.66802142727883]
Large Language Model Guided Intervention Targeting (LeGIT) is a robust framework that effectively incorporates LLMs to augment existing numerical approaches for the intervention targeting in causal discovery.<n>LeGIT demonstrates significant improvements and robustness over existing methods and even surpasses humans.
arXiv Detail & Related papers (2025-03-03T03:43:05Z)
Evaluating Interventional Reasoning Capabilities of Large Language Models [58.52919374786108]
Large language models (LLMs) are used to automate decision-making tasks.<n>In this paper, we evaluate whether LLMs can accurately update their knowledge of a data-generating process in response to an intervention.<n>We create benchmarks that span diverse causal graphs (e.g., confounding, mediation) and variable types.<n>These benchmarks allow us to isolate the ability of LLMs to accurately predict changes resulting from their ability to memorize facts or find other shortcuts.
arXiv Detail & Related papers (2024-04-08T14:15:56Z)
Assessing biomedical knowledge robustness in large language models by query-efficient sampling attacks [0.6282171844772422]
An increasing depth of parametric domain knowledge in large language models (LLMs) is fueling their rapid deployment in real-world applications.<n>The recent discovery of named entities as adversarial examples in natural language processing tasks raises questions about their potential impact on the knowledge robustness of pre-trained and finetuned LLMs.<n>We developed an embedding-space attack based on powerscaled distance-weighted sampling to assess the robustness of their biomedical knowledge.
arXiv Detail & Related papers (2024-02-16T09:29:38Z)
Discovery of the Hidden World with Large Language Models [95.58823685009727]
This paper presents Causal representatiOn AssistanT (COAT) that introduces large language models (LLMs) to bridge the gap. LLMs are trained on massive observations of the world and have demonstrated great capability in extracting key information from unstructured data. COAT also adopts CDs to find causal relations among the identified variables as well as to provide feedback to LLMs to iteratively refine the proposed factors.
arXiv Detail & Related papers (2024-02-06T12:18:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.