CALM: A Causal Analysis Language Model for Tabular Data in Complex Systems with Local Scores, Conditional Independence Tests, and Relation Attributes
- URL: http://arxiv.org/abs/2510.09846v1
- Date: Fri, 10 Oct 2025 20:19:20 GMT
- Title: CALM: A Causal Analysis Language Model for Tabular Data in Complex Systems with Local Scores, Conditional Independence Tests, and Relation Attributes
- Authors: Zhenjiang Fan, Zengyi Qin, Yuanning Zheng, Bo Xiong, Summer Han,
- Abstract summary: Causal discovery from observational data is fundamental to scientific fields like biology.<n>Existing methods, including constraint-based and score-based approaches, face significant limitations.<n>We introduce CALM, a novel causal analysis language model specifically designed for tabular data.
- Score: 15.298086464296235
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Causal discovery from observational data is fundamental to scientific fields like biology, where controlled experiments are often impractical. However, existing methods, including constraint-based (e.g., PC, causalMGM) and score-based approaches (e.g., NOTEARS), face significant limitations. These include an inability to resolve causal direction, restrictions to linear associations, sensitivity to violations of the faithfulness assumption, and inefficiency in searching vast hypothesis spaces. While large language models (LLMs) offer powerful reasoning capabilities, their application is hindered by a fundamental discrepancy: they are designed for text, while most causal data is tabular. To address these challenges, we introduce CALM, a novel causal analysis language model specifically designed for tabular data in complex systems. CALM leverages a Mamba-based architecture to classify causal patterns from pairwise variable relationships. It integrates a comprehensive suite of evidence, including local causal scores, conditional independence tests, and relational attributes, to capture a wide spectrum of linear, nonlinear, and conditional causal mechanisms. Trained on a diverse corpus of synthetic data (from linear, mixed, and nonlinear models) and 10 real-world biological datasets with rigorously validated causal relationships, our model ensures robustness and generalizability. Empirical evaluation demonstrates that CALM significantly outperforms existing methods in both simulation studies, achieving over 91% accuracy, and in a real-world application identifying causal factors in Hepatitis C virus progression. This work represents a significant step towards accurate and generalizable causal discovery by successfully adapting the pattern recognition capabilities of language models to the intricacies of tabular data.
Related papers
- Data-Driven Information-Theoretic Causal Bounds under Unmeasured Confounding [10.590231532335691]
We develop a data-driven information-theoretic framework for partial identification of conditional causal effects under unmeasured confounding.<n>Our key theoretical contribution shows that the f-divergence between the observational distribution P(Y | A = a, X = x) and the interventional distribution P(Y | do(A = a), X = x) is upper bounded by a function of the propensity score alone.<n>This result enables sharp partial identification of conditional causal effects directly from observational data, without requiring external sensitivity parameters, auxiliary variables, full structural specifications, or outcome boundedness assumptions.
arXiv Detail & Related papers (2026-01-23T20:47:48Z) - Causal SHAP: Feature Attribution with Dependency Awareness through Causal Discovery [3.717095609283206]
Causal SHAP is a novel framework that integrates causal relationships into feature attribution.<n>This study contributes to the field of Explainable AI (XAI) by providing a practical framework for causal-aware model explanations.
arXiv Detail & Related papers (2025-08-31T13:31:34Z) - Hierarchical Variable Importance with Statistical Control for Medical Data-Based Prediction [35.94354098982828]
We introduce Hierarchical-CPI, a model-agnostic variable importance measure that frames the inference problem as the discovery of groups of variables.<n>By exploring subgroups along a hierarchical tree, it remains computationally tractable, yet also enjoys explicit family-wise error rate control.<n>Its effectiveness is demonstrated in two neuroimaging datasets.
arXiv Detail & Related papers (2025-08-12T08:10:54Z) - Retrieving Classes of Causal Orders with Inconsistent Knowledge Bases [0.8192907805418583]
Large Language Models (LLMs) have emerged as a promising alternative for extracting causal knowledge from text-based metadata.<n>LLMs tend to be unreliable and prone to hallucinations, necessitating strategies that account for their limitations.<n>We present a new method to derive a class of acyclic tournaments, which represent plausible causal orders.
arXiv Detail & Related papers (2024-12-18T16:37:51Z) - CAnDOIT: Causal Discovery with Observational and Interventional Data from Time-Series [4.008958683836471]
CAnDOIT is a causal discovery method to reconstruct causal models using both observational and interventional data.
The use of interventional data in the causal analysis is crucial for real-world applications, such as robotics.
A Python implementation of CAnDOIT has also been developed and is publicly available on GitHub.
arXiv Detail & Related papers (2024-10-03T13:57:08Z) - Discovering and Reasoning of Causality in the Hidden World with Large Language Models [109.62442253177376]
We develop a new framework termed Causal representatiOn AssistanT (COAT) to propose useful measured variables for causal discovery.<n>Instead of directly inferring causality with Large language models (LLMs), COAT constructs feedback from intermediate causal discovery results to LLMs to refine the proposed variables.
arXiv Detail & Related papers (2024-02-06T12:18:54Z) - A Neural Framework for Generalized Causal Sensitivity Analysis [78.71545648682705]
We propose NeuralCSA, a neural framework for causal sensitivity analysis.
We provide theoretical guarantees that NeuralCSA is able to infer valid bounds on the causal query of interest.
arXiv Detail & Related papers (2023-11-27T17:40:02Z) - Identifiable Latent Polynomial Causal Models Through the Lens of Change [82.14087963690561]
Causal representation learning aims to unveil latent high-level causal representations from observed low-level data.<n>One of its primary tasks is to provide reliable assurance of identifying these latent causal models, known as identifiability.
arXiv Detail & Related papers (2023-10-24T07:46:10Z) - Discovering Mixtures of Structural Causal Models from Time Series Data [23.18511951330646]
We propose a general variational inference-based framework called MCD to infer the underlying causal models.
Our approach employs an end-to-end training process that maximizes an evidence-lower bound for the data likelihood.
We demonstrate that our method surpasses state-of-the-art benchmarks in causal discovery tasks.
arXiv Detail & Related papers (2023-10-10T05:13:10Z) - A Causal Framework for Decomposing Spurious Variations [68.12191782657437]
We develop tools for decomposing spurious variations in Markovian and Semi-Markovian models.
We prove the first results that allow a non-parametric decomposition of spurious effects.
The described approach has several applications, ranging from explainable and fair AI to questions in epidemiology and medicine.
arXiv Detail & Related papers (2023-06-08T09:40:28Z) - Causality-Based Multivariate Time Series Anomaly Detection [63.799474860969156]
We formulate the anomaly detection problem from a causal perspective and view anomalies as instances that do not follow the regular causal mechanism to generate the multivariate data.
We then propose a causality-based anomaly detection approach, which first learns the causal structure from data and then infers whether an instance is an anomaly relative to the local causal mechanism.
We evaluate our approach with both simulated and public datasets as well as a case study on real-world AIOps applications.
arXiv Detail & Related papers (2022-06-30T06:00:13Z) - Amortized Inference for Causal Structure Learning [72.84105256353801]
Learning causal structure poses a search problem that typically involves evaluating structures using a score or independence test.
We train a variational inference model to predict the causal structure from observational/interventional data.
Our models exhibit robust generalization capabilities under substantial distribution shift.
arXiv Detail & Related papers (2022-05-25T17:37:08Z) - Structural Causal Models Are (Solvable by) Credal Networks [70.45873402967297]
Causal inferences can be obtained by standard algorithms for the updating of credal nets.
This contribution should be regarded as a systematic approach to represent structural causal models by credal networks.
Experiments show that approximate algorithms for credal networks can immediately be used to do causal inference in real-size problems.
arXiv Detail & Related papers (2020-08-02T11:19:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.