Paths to Causality: Finding Informative Subgraphs Within Knowledge Graphs for Knowledge-Based Causal Discovery
- URL: http://arxiv.org/abs/2506.08771v1
- Date: Tue, 10 Jun 2025 13:13:55 GMT
- Title: Paths to Causality: Finding Informative Subgraphs Within Knowledge Graphs for Knowledge-Based Causal Discovery
- Authors: Yuni Susanti, Michael Färber,
- Abstract summary: We introduce a novel approach that integrates Knowledge Graphs (KGs) with Large Language Models (LLMs) to enhance knowledge-based causal discovery.<n>Our approach identifies informative metapath-based subgraphs within KGs and further refines the selection of these subgraphs using Learning-to-Rank-based models.<n>Our method outperforms most baselines by up to 44.4 points in F1 scores, evaluated across diverse LLMs and KGs.
- Score: 10.573861741540853
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Inferring causal relationships between variable pairs is crucial for understanding multivariate interactions in complex systems. Knowledge-based causal discovery -- which involves inferring causal relationships by reasoning over the metadata of variables (e.g., names or textual context) -- offers a compelling alternative to traditional methods that rely on observational data. However, existing methods using Large Language Models (LLMs) often produce unstable and inconsistent results, compromising their reliability for causal inference. To address this, we introduce a novel approach that integrates Knowledge Graphs (KGs) with LLMs to enhance knowledge-based causal discovery. Our approach identifies informative metapath-based subgraphs within KGs and further refines the selection of these subgraphs using Learning-to-Rank-based models. The top-ranked subgraphs are then incorporated into zero-shot prompts, improving the effectiveness of LLMs in inferring the causal relationship. Extensive experiments on biomedical and open-domain datasets demonstrate that our method outperforms most baselines by up to 44.4 points in F1 scores, evaluated across diverse LLMs and KGs. Our code and datasets are available on GitHub: https://github.com/susantiyuni/path-to-causality
Related papers
- Uncovering Bias Paths with LLM-guided Causal Discovery: An Active Learning and Dynamic Scoring Approach [1.5498930424110338]
Large Language Models (LLMs) offer a promising complement to statistical Causal Discovery (CD) approaches.<n> Ensuring fairness in machine learning requires understanding how sensitive attributes causally influence outcomes.<n>We propose a hybrid LLM-based framework for CD that extends a breadth-first search (BFS) strategy with active learning and dynamic scoring.
arXiv Detail & Related papers (2025-06-13T21:04:03Z) - Learning Efficient and Generalizable Graph Retriever for Knowledge-Graph Question Answering [75.12322966980003]
Large Language Models (LLMs) have shown strong inductive reasoning ability across various domains.<n>Most existing RAG pipelines rely on unstructured text, limiting interpretability and structured reasoning.<n>Recent studies have explored integrating knowledge graphs with LLMs for knowledge graph question answering.<n>We propose RAPL, a novel framework for efficient and effective graph retrieval in KGQA.
arXiv Detail & Related papers (2025-06-11T12:03:52Z) - ADKGD: Anomaly Detection in Knowledge Graphs with Dual-Channel Training [38.3788247358102]
This paper presents an anomaly detection algorithm in knowledge graphs with dual-channel learning (ADKGD)<n>We introduce a kullback-leibler (KL)-loss component to improve the accuracy of the scoring function between the dual channels.<n> Experimental results demonstrate that ADKGD outperforms the state-of-the-art anomaly detection algorithms.
arXiv Detail & Related papers (2025-01-13T06:22:52Z) - Harnessing Large Language Models for Knowledge Graph Question Answering via Adaptive Multi-Aspect Retrieval-Augmentation [81.18701211912779]
We introduce an Adaptive Multi-Aspect Retrieval-augmented over KGs (Amar) framework.<n>This method retrieves knowledge including entities, relations, and subgraphs, and converts each piece of retrieved text into prompt embeddings.<n>Our method has achieved state-of-the-art performance on two common datasets.
arXiv Detail & Related papers (2024-12-24T16:38:04Z) - Discovery of Maximally Consistent Causal Orders with Large Language Models [0.8192907805418583]
Causal discovery is essential for understanding complex systems.<n>Traditional methods often rely on strong, untestable assumptions.<n>We propose a novel method to derive a class of acyclic tournaments.
arXiv Detail & Related papers (2024-12-18T16:37:51Z) - Knowledge Graph Structure as Prompt: Improving Small Language Models Capabilities for Knowledge-based Causal Discovery [10.573861741540853]
KG Structure as Prompt is a novel approach for integrating structural information from a knowledge graph, such as common neighbor nodes and metapaths, into prompt-based learning.
Experimental results on three types of biomedical and open-domain datasets under few-shot settings demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2024-07-26T14:07:00Z) - Causal Graph Discovery with Retrieval-Augmented Generation based Large Language Models [23.438388321411693]
Causal graph recovery is traditionally done using statistical estimation-based methods or based on individual's knowledge about variables of interests.
We propose a novel method that leverages large language models (LLMs) to deduce causal relationships in general causal graph recovery tasks.
arXiv Detail & Related papers (2024-02-23T13:02:10Z) - Discovery of the Hidden World with Large Language Models [95.58823685009727]
This paper presents Causal representatiOn AssistanT (COAT) that introduces large language models (LLMs) to bridge the gap.
LLMs are trained on massive observations of the world and have demonstrated great capability in extracting key information from unstructured data.
COAT also adopts CDs to find causal relations among the identified variables as well as to provide feedback to LLMs to iteratively refine the proposed factors.
arXiv Detail & Related papers (2024-02-06T12:18:54Z) - Multi-modal Causal Structure Learning and Root Cause Analysis [67.67578590390907]
We propose Mulan, a unified multi-modal causal structure learning method for root cause localization.
We leverage a log-tailored language model to facilitate log representation learning, converting log sequences into time-series data.
We also introduce a novel key performance indicator-aware attention mechanism for assessing modality reliability and co-learning a final causal graph.
arXiv Detail & Related papers (2024-02-04T05:50:38Z) - DCID: Deep Canonical Information Decomposition [84.59396326810085]
We consider the problem of identifying the signal shared between two one-dimensional target variables.
We propose ICM, an evaluation metric which can be used in the presence of ground-truth labels.
We also propose Deep Canonical Information Decomposition (DCID) - a simple, yet effective approach for learning the shared variables.
arXiv Detail & Related papers (2023-06-27T16:59:06Z) - Normalizing Flow-based Neural Process for Few-Shot Knowledge Graph
Completion [69.55700751102376]
Few-shot knowledge graph completion (FKGC) aims to predict missing facts for unseen relations with few-shot associated facts.
Existing FKGC methods are based on metric learning or meta-learning, which often suffer from the out-of-distribution and overfitting problems.
In this paper, we propose a normalizing flow-based neural process for few-shot knowledge graph completion (NP-FKGC)
arXiv Detail & Related papers (2023-04-17T11:42:28Z) - Inferential Text Generation with Multiple Knowledge Sources and
Meta-Learning [117.23425857240679]
We study the problem of generating inferential texts of events for a variety of commonsense like textitif-else relations.
Existing approaches typically use limited evidence from training examples and learn for each relation individually.
In this work, we use multiple knowledge sources as fuels for the model.
arXiv Detail & Related papers (2020-04-07T01:49:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.