IRIS: An Iterative and Integrated Framework for Verifiable Causal Discovery in the Absence of Tabular Data
- URL: http://arxiv.org/abs/2510.09217v1
- Date: Fri, 10 Oct 2025 09:50:26 GMT
- Title: IRIS: An Iterative and Integrated Framework for Verifiable Causal Discovery in the Absence of Tabular Data
- Authors: Tao Feng, Lizhen Qu, Niket Tandon, Gholamreza Haffari,
- Abstract summary: We introduce IRIS (Iterative Retrieval and Integrated System for Real-Time Causal Discovery), a novel framework that addresses these limitations.<n>Our approach enables real-time causal discovery from only a set of initial variables without requiring pre-existing datasets.
- Score: 55.37714903189613
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Causal discovery is fundamental to scientific research, yet traditional statistical algorithms face significant challenges, including expensive data collection, redundant computation for known relations, and unrealistic assumptions. While recent LLM-based methods excel at identifying commonly known causal relations, they fail to uncover novel relations. We introduce IRIS (Iterative Retrieval and Integrated System for Real-Time Causal Discovery), a novel framework that addresses these limitations. Starting with a set of initial variables, IRIS automatically collects relevant documents, extracts variables, and uncovers causal relations. Our hybrid causal discovery method combines statistical algorithms and LLM-based methods to discover known and novel causal relations. In addition to causal discovery on initial variables, the missing variable proposal component of IRIS identifies and incorporates missing variables to expand the causal graphs. Our approach enables real-time causal discovery from only a set of initial variables without requiring pre-existing datasets.
Related papers
- CALM: A Causal Analysis Language Model for Tabular Data in Complex Systems with Local Scores, Conditional Independence Tests, and Relation Attributes [15.298086464296235]
Causal discovery from observational data is fundamental to scientific fields like biology.<n>Existing methods, including constraint-based and score-based approaches, face significant limitations.<n>We introduce CALM, a novel causal analysis language model specifically designed for tabular data.
arXiv Detail & Related papers (2025-10-10T20:19:20Z) - Multi-View Causal Discovery without Non-Gaussianity: Identifiability and Algorithms [59.15672758767244]
Causal discovery is a difficult problem that typically relies on strong assumptions on the data-generating model, such as non-Gaussianity.<n>Here, we leverage this multi-view structure to achieve causal discovery with weak assumptions.<n>We propose several multi-view causal discovery algorithms, inspired by single-view algorithms.
arXiv Detail & Related papers (2025-02-27T14:06:14Z) - Retrieving Classes of Causal Orders with Inconsistent Knowledge Bases [0.8192907805418583]
Large Language Models (LLMs) have emerged as a promising alternative for extracting causal knowledge from text-based metadata.<n>LLMs tend to be unreliable and prone to hallucinations, necessitating strategies that account for their limitations.<n>We present a new method to derive a class of acyclic tournaments, which represent plausible causal orders.
arXiv Detail & Related papers (2024-12-18T16:37:51Z) - Induced Covariance for Causal Discovery in Linear Sparse Structures [55.2480439325792]
Causal models seek to unravel the cause-effect relationships among variables from observed data.
This paper introduces a novel causal discovery algorithm designed for settings in which variables exhibit linearly sparse relationships.
arXiv Detail & Related papers (2024-10-02T04:01:38Z) - Unsupervised Pairwise Causal Discovery on Heterogeneous Data using Mutual Information Measures [49.1574468325115]
Causal Discovery is a technique that tackles the challenge by analyzing the statistical properties of the constituent variables.
We question the current (possibly misleading) baseline results on the basis that they were obtained through supervised learning.
In consequence, we approach this problem in an unsupervised way, using robust Mutual Information measures.
arXiv Detail & Related papers (2024-08-01T09:11:08Z) - Causal Graph Discovery with Retrieval-Augmented Generation based Large Language Models [23.438388321411693]
Causal graph recovery is traditionally done using statistical estimation-based methods or based on individual's knowledge about variables of interests.
We propose a novel method that leverages large language models (LLMs) to deduce causal relationships in general causal graph recovery tasks.
arXiv Detail & Related papers (2024-02-23T13:02:10Z) - Discovery of the Hidden World with Large Language Models [95.58823685009727]
This paper presents Causal representatiOn AssistanT (COAT) that introduces large language models (LLMs) to bridge the gap.
LLMs are trained on massive observations of the world and have demonstrated great capability in extracting key information from unstructured data.
COAT also adopts CDs to find causal relations among the identified variables as well as to provide feedback to LLMs to iteratively refine the proposed factors.
arXiv Detail & Related papers (2024-02-06T12:18:54Z) - Multi-modal Causal Structure Learning and Root Cause Analysis [67.67578590390907]
We propose Mulan, a unified multi-modal causal structure learning method for root cause localization.
We leverage a log-tailored language model to facilitate log representation learning, converting log sequences into time-series data.
We also introduce a novel key performance indicator-aware attention mechanism for assessing modality reliability and co-learning a final causal graph.
arXiv Detail & Related papers (2024-02-04T05:50:38Z) - $\ exttt{causalAssembly}$: Generating Realistic Production Data for
Benchmarking Causal Discovery [1.3048920509133808]
We build a system for generation of semisynthetic manufacturing data that supports benchmarking of causal discovery methods.
We employ distributional random forests to flexibly estimate and represent conditional distributions.
Using the library, we showcase how to benchmark several well-known causal discovery algorithms.
arXiv Detail & Related papers (2023-06-19T10:05:54Z) - Causal discovery under a confounder blanket [9.196779204457059]
Inferring causal relationships from observational data is rarely straightforward, but the problem is especially difficult in high dimensions.
We relax these assumptions and focus on an important but more specialized problem, namely recovering a directed acyclic subgraph.
We derive a complete algorithm for identifying causal relationships under these conditions and implement testing procedures.
arXiv Detail & Related papers (2022-05-11T18:10:45Z) - Improving Efficiency and Accuracy of Causal Discovery Using a
Hierarchical Wrapper [7.570246812206772]
Causal discovery from observational data is an important tool in many branches of science.
In the large sample limit, sound and complete causal discovery algorithms have been previously introduced.
However, only finite training data is available, which limits the power of statistical tests used by these algorithms.
arXiv Detail & Related papers (2021-07-11T09:24:49Z) - Disentangling Observed Causal Effects from Latent Confounders using
Method of Moments [67.27068846108047]
We provide guarantees on identifiability and learnability under mild assumptions.
We develop efficient algorithms based on coupled tensor decomposition with linear constraints to obtain scalable and guaranteed solutions.
arXiv Detail & Related papers (2021-01-17T07:48:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.