Related papers: What Breaks Knowledge Graph based RAG? Empirical Insights into Reasoning under Incomplete Knowledge

What Breaks Knowledge Graph based RAG? Empirical Insights into Reasoning under Incomplete Knowledge

URL: http://arxiv.org/abs/2508.08344v2
Date: Fri, 29 Aug 2025 16:43:48 GMT
Title: What Breaks Knowledge Graph based RAG? Empirical Insights into Reasoning under Incomplete Knowledge
Authors: Dongzhuoran Zhou, Yuqicheng Zhu, Xiaxia Wang, Hongkuan Zhou, Yuan He, Jiaoyan Chen, Steffen Staab, Evgeny Kharlamov,
Abstract summary: Knowledge Graph-based Retrieval-Augmented Generation (KG-RAG) is an increasingly explored approach for combining the reasoning capabilities of large language models with the structured evidence of knowledge graphs.<n>Existing benchmarks often include questions that can be directly answered using existing triples in KG.<n>In this work, we introduce a general method for constructing benchmarks, together with an evaluation protocol, to systematically assess KG-RAG methods under knowledge incompleteness.
Score: 26.260367028968385
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Knowledge Graph-based Retrieval-Augmented Generation (KG-RAG) is an increasingly explored approach for combining the reasoning capabilities of large language models with the structured evidence of knowledge graphs. However, current evaluation practices fall short: existing benchmarks often include questions that can be directly answered using existing triples in KG, making it unclear whether models perform reasoning or simply retrieve answers directly. Moreover, inconsistent evaluation metrics and lenient answer matching criteria further obscure meaningful comparisons. In this work, we introduce a general method for constructing benchmarks, together with an evaluation protocol, to systematically assess KG-RAG methods under knowledge incompleteness. Our empirical results show that current KG-RAG methods have limited reasoning ability under missing knowledge, often rely on internal memorization, and exhibit varying degrees of generalization depending on their design.

Related papers

GR-Agent: Adaptive Graph Reasoning Agent under Incomplete Knowledge [26.26036702896838]
Most benchmarks assume complete knowledge graphs (KGs) where direct supporting triples exist.<n>This reduces evaluation to shallow retrieval and overlooks the reality of incomplete KGs, where many facts are missing and answers must be inferred from existing facts.<n>We propose a methodology for constructing benchmarks under KG incompleteness, which removes direct supporting triples while ensuring that alternative reasoning paths required to infer the answer remain.
arXiv Detail & Related papers (2025-12-16T06:11:30Z)
Enrich-on-Graph: Query-Graph Alignment for Complex Reasoning with LLM Enriching [61.824094419641575]
Large Language Models (LLMs) struggle with hallucinations and factual errors in knowledge-intensive scenarios like knowledge graph question answering (KGQA)<n>We attribute this to the semantic gap between structured knowledge graphs (KGs) and unstructured queries, caused by inherent differences in their focuses and structures.<n>Existing methods usually employ resource-intensive, non-scalable reasoning on vanilla KGs, but overlook this gap.<n>We propose a flexible framework, Enrich-on-Graph (EoG), which leverages LLMs' prior knowledge to enrich KGs, bridge the semantic gap between graphs and queries.
arXiv Detail & Related papers (2025-09-25T06:48:52Z)
KG-Infused RAG: Augmenting Corpus-Based RAG with External Knowledge Graphs [66.35046942874737]
KG-Infused RAG is a framework that integrates KGs into RAG systems to implement spreading activation.<n> KG-Infused RAG retrieves KG facts, expands the query accordingly, and enhances generation by combining corpus passages with structured facts.
arXiv Detail & Related papers (2025-06-11T09:20:02Z)
Diagnosing and Addressing Pitfalls in KG-RAG Datasets: Toward More Reliable Benchmarking [56.27361644734853]
Knowledge Graph Question Answering systems rely on high-quality benchmarks to evaluate complex multi-hop reasoning.<n>Despite their widespread use, popular datasets such as WebQSP and CWQ suffer from critical quality issues.<n>We introduce KGQAGen, an LLM-in-the-loop framework that systematically resolves these pitfalls.<n>Our findings advocate for more rigorous benchmark construction and position KGQAGen as a scalable framework for advancing KGQA evaluation.
arXiv Detail & Related papers (2025-05-29T14:44:52Z)
Evaluating Knowledge Graph Based Retrieval Augmented Generation Methods under Knowledge Incompleteness [25.74411097212245]
Knowledge Graph based Retrieval-Augmented Generation (KG-RAG) is a technique that enhances Large Language Model (LLM) inference in tasks like Question Answering (QA)<n>Existing benchmarks do not adequately capture the impact of KG incompleteness on KG-RAG performance.<n>We demonstrate that KG-RAG methods are sensitive to KG incompleteness, highlighting the need for more robust approaches in realistic settings.
arXiv Detail & Related papers (2025-04-07T15:08:03Z)
MCTS-RAG: Enhancing Retrieval-Augmented Generation with Monte Carlo Tree Search [61.11836311160951]
We introduce MCTS-RAG, a novel approach that enhances the reasoning capabilities of small language models on knowledge-intensive tasks.<n>Unlike standard RAG methods, which typically retrieve information independently from reasoning, MCTS-RAG combines structured reasoning with adaptive retrieval.<n>This integrated approach enhances decision-making, reduces hallucinations, and ensures improved factual accuracy and response consistency.
arXiv Detail & Related papers (2025-03-26T17:46:08Z)
Context Graph [8.02985792541121]
We present a context graph reasoning textbfCGR$3$ paradigm that leverages large language models (LLMs) to retrieve candidate entities and related contexts. Our experimental results demonstrate that CGR$3$ significantly improves performance on KG completion (KGC) and KG question answering (KGQA) tasks.
arXiv Detail & Related papers (2024-06-17T02:59:19Z)
History repeats Itself: A Baseline for Temporal Knowledge Graph Forecasting [10.396081172890025]
Temporal Knowledge Graph (TKG) Forecasting aims at predicting links in Knowledge Graphs for future timesteps based on a history of Knowledge Graphs. We propose to design an intuitive baseline for TKG Forecasting based on predicting recurring facts.
arXiv Detail & Related papers (2024-04-25T16:39:32Z)
Systematic Assessment of Factual Knowledge in Large Language Models [48.75961313441549]
This paper proposes a framework to assess the factual knowledge of large language models (LLMs) by leveraging knowledge graphs (KGs) Our framework automatically generates a set of questions and expected answers from the facts stored in a given KG, and then evaluates the accuracy of LLMs in answering these questions.
arXiv Detail & Related papers (2023-10-18T00:20:50Z)
Normalizing Flow-based Neural Process for Few-Shot Knowledge Graph Completion [69.55700751102376]
Few-shot knowledge graph completion (FKGC) aims to predict missing facts for unseen relations with few-shot associated facts. Existing FKGC methods are based on metric learning or meta-learning, which often suffer from the out-of-distribution and overfitting problems. In this paper, we propose a normalizing flow-based neural process for few-shot knowledge graph completion (NP-FKGC)
arXiv Detail & Related papers (2023-04-17T11:42:28Z)
KGxBoard: Explainable and Interactive Leaderboard for Evaluation of Knowledge Graph Completion Models [76.01814380927507]
KGxBoard is an interactive framework for performing fine-grained evaluation on meaningful subsets of the data. In our experiments, we highlight the findings with the use of KGxBoard, which would have been impossible to detect with standard averaged single-score metrics.
arXiv Detail & Related papers (2022-08-23T15:11:45Z)
MPLR: a novel model for multi-target learning of logical rules for knowledge graph reasoning [5.499688003232003]
We study the problem of learning logic rules for reasoning on knowledge graphs for completing missing factual triplets. We propose a model called MPLR that improves the existing models to fully use training data and multi-target scenarios are considered. Experimental results empirically demonstrate that our MPLR model outperforms state-of-the-art methods on five benchmark datasets.
arXiv Detail & Related papers (2021-12-12T09:16:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.