Can Knowledge Graphs Make Large Language Models More Trustworthy? An Empirical Study Over Open-ended Question Answering
- URL: http://arxiv.org/abs/2410.08085v3
- Date: Wed, 19 Feb 2025 08:25:48 GMT
- Title: Can Knowledge Graphs Make Large Language Models More Trustworthy? An Empirical Study Over Open-ended Question Answering
- Authors: Yuan Sui, Yufei He, Zifeng Ding, Bryan Hooi,
- Abstract summary: This study explores whether Knowledge Graphs can make Large Language Models (LLMs) more trustworthy in an open-ended setting.<n> OKGQA is a benchmark specifically designed to assess LLMs enhanced with Knowledge Graphs under open-ended, real-world question answering scenarios.<n> OKGQA-P is a benchmark variant to assess model performance when the semantics and structure of KGs are deliberately perturbed and contaminated.
- Score: 30.12049172634714
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Recent works integrating Knowledge Graphs (KGs) have led to promising improvements in enhancing the reasoning accuracy of Large Language Models (LLMs). However, current benchmarks focus mainly on closed-ended tasks, leaving a gap in the assessment of more complex real-world scenarios. This gap has also obscured the evaluation of KGs' potential to mitigate the problem of hallucination in LLMs. To fill the gap, we introduce OKGQA, a new benchmark specifically designed to assess LLMs enhanced with KGs under open-ended, real-world question answering scenarios. OKGQA is designed to closely reflect the complexities of practical applications using questions from different types, and incorporates specific metrics to measure both hallucination ratio and the enhancement in reasoning capabilities. To consider the scenario in which KGs may have varying levels of mistakes, we propose another benchmark variant OKGQA-P to assess model performance when the semantics and structure of KGs are deliberately perturbed and contaminated. OKGQA aims to (1) explore whether KGs can make LLMs more trustworthy in an open-ended setting, and (2) conduct a comparative analysis to shed light on method design. We believe that this study can facilitate a more complete performance comparison and encourage continuous improvement in integrating KGs with LLMs to reduce hallucination.
Related papers
- Verifying the Verifiers: Unveiling Pitfalls and Potentials in Fact Verifiers [59.168391398830515]
We evaluate 12 pre-trained LLMs and one specialized fact-verifier, using a collection of examples from 14 fact-checking benchmarks.<n>We highlight the importance of addressing annotation errors and ambiguity in datasets.<n> frontier LLMs with few-shot in-context examples, often overlooked in previous works, achieve top-tier performance.
arXiv Detail & Related papers (2025-06-16T10:32:10Z) - ClaimPKG: Enhancing Claim Verification via Pseudo-Subgraph Generation with Lightweight Specialized LLM [3.864321514889099]
ClaimPKG is an end-to-end framework that seamlessly integrates LLM reasoning with structured knowledge from knowledge graphs (KGs)<n>ClaimPKG achieves state-of-the-art performance, outperforming strong baselines in this research field by 9%-12% accuracy points across multiple categories.
arXiv Detail & Related papers (2025-05-28T16:34:14Z) - Knowledge Graph-extended Retrieval Augmented Generation for Question Answering [10.49712834719005]
This paper proposes a system that integrates Large Language Models (LLMs) and Knowledge Graphs (KGs) without requiring training.<n>The resulting approach can be classified as a specific form of a Retrieval Augmented Generation (RAG) with a KG.<n>It includes a question decomposition module to enhance multi-hop information retrieval and answerability.
arXiv Detail & Related papers (2025-04-11T18:03:02Z) - LightPROF: A Lightweight Reasoning Framework for Large Language Model on Knowledge Graph [57.382255728234064]
Large Language Models (LLMs) have impressive capabilities in text understanding and zero-shot reasoning.<n> Knowledge Graphs (KGs) provide rich and reliable contextual information for the reasoning process of LLMs.<n>We propose a novel Lightweight and efficient Prompt learning-ReasOning Framework for KGQA (LightPROF)
arXiv Detail & Related papers (2025-04-04T03:03:47Z) - OCEAN: Offline Chain-of-thought Evaluation and Alignment in Large Language Models [68.17018458283651]
This work focuses on the offline evaluation of the chain-of-thought capabilities of LLMs.
We use knowledge graphs (e.g., Wikidata5m) to provide feedback on the generated chain of thoughts.
We show how to optimize LLMs based on the proposed evaluation method.
arXiv Detail & Related papers (2024-10-31T07:48:44Z) - Simple Is Effective: The Roles of Graphs and Large Language Models in Knowledge-Graph-Based Retrieval-Augmented Generation [9.844598565914055]
Large Language Models (LLMs) demonstrate strong reasoning abilities but face limitations such as hallucinations and outdated knowledge.
We introduce SubgraphRAG, extending the Knowledge Graph (KG)-based Retrieval-Augmented Generation (RAG) framework that retrieves subgraphs.
Our approach innovatively integrates a lightweight multilayer perceptron with a parallel triple-scoring mechanism for efficient and flexible subgraph retrieval.
arXiv Detail & Related papers (2024-10-28T04:39:32Z) - Graph-constrained Reasoning: Faithful Reasoning on Knowledge Graphs with Large Language Models [83.28737898989694]
Large language models (LLMs) struggle with faithful reasoning due to knowledge gaps and hallucinations.
We introduce graph-constrained reasoning (GCR), a novel framework that bridges structured knowledge in KGs with unstructured reasoning in LLMs.
GCR achieves state-of-the-art performance and exhibits strong zero-shot generalizability to unseen KGs without additional training.
arXiv Detail & Related papers (2024-10-16T22:55:17Z) - Tree-of-Traversals: A Zero-Shot Reasoning Algorithm for Augmenting Black-box Language Models with Knowledge Graphs [72.89652710634051]
Knowledge graphs (KGs) complement Large Language Models (LLMs) by providing reliable, structured, domain-specific, and up-to-date external knowledge.
We introduce Tree-of-Traversals, a novel zero-shot reasoning algorithm that enables augmentation of black-box LLMs with one or more KGs.
arXiv Detail & Related papers (2024-07-31T06:01:24Z) - EffiQA: Efficient Question-Answering with Strategic Multi-Model Collaboration on Knowledge Graphs [11.323661062578799]
EffiQA consists of three stages: global planning, efficient KG exploration, and self-reflection.
Empirical evidence on multiple KBQA benchmarks shows EffiQA's effectiveness.
We hope the proposed new framework will pave the way for efficient, knowledge-intensive querying.
arXiv Detail & Related papers (2024-06-03T11:56:07Z) - Generate-on-Graph: Treat LLM as both Agent and KG in Incomplete Knowledge Graph Question Answering [87.67177556994525]
We propose a training-free method called Generate-on-Graph (GoG) to generate new factual triples while exploring Knowledge Graphs (KGs)
GoG performs reasoning through a Thinking-Searching-Generating framework, which treats LLM as both Agent and KG in IKGQA.
arXiv Detail & Related papers (2024-04-23T04:47:22Z) - Large Language Models Can Better Understand Knowledge Graphs Than We Thought [13.336418752729987]
knowledge graph (KG) embeddings with model parameters become increasingly costly.
Current prompting methods often rely on a trial-and-error approach.
We show that unordered linearized triples are more effective for LLMs' understanding of KGs compared to fluent NL text.
arXiv Detail & Related papers (2024-02-18T10:44:03Z) - HyKGE: A Hypothesis Knowledge Graph Enhanced Framework for Accurate and Reliable Medical LLMs Responses [20.635793525894872]
We develop a Hypothesis Knowledge Graph Enhanced (HyKGE) framework to improve the accuracy and reliability of Large Language Models (LLMs)
Specifically, HyKGE explores the zero-shot capability and the rich knowledge of LLMs with Hypothesis Outputs to extend feasible exploration directions in the KGs.
Experiments on two Chinese medical multiple-choice question datasets and one Chinese open-domain medical Q&A dataset with two LLM turbos demonstrate the superiority of HyKGE in terms of accuracy and explainability.
arXiv Detail & Related papers (2023-12-26T04:49:56Z) - Mitigating Large Language Model Hallucinations via Autonomous Knowledge
Graph-based Retrofitting [51.7049140329611]
This paper proposes Knowledge Graph-based Retrofitting (KGR) to mitigate factual hallucination during the reasoning process.
Experiments show that KGR can significantly improve the performance of LLMs on factual QA benchmarks.
arXiv Detail & Related papers (2023-11-22T11:08:38Z) - Reasoning on Graphs: Faithful and Interpretable Large Language Model
Reasoning [104.92384929827776]
Large language models (LLMs) have demonstrated impressive reasoning abilities in complex tasks.
They lack up-to-date knowledge and experience hallucinations during reasoning.
Knowledge graphs (KGs) offer a reliable source of knowledge for reasoning.
arXiv Detail & Related papers (2023-10-02T10:14:43Z) - Retrieve-Rewrite-Answer: A KG-to-Text Enhanced LLMs Framework for
Knowledge Graph Question Answering [16.434098552925427]
We study the KG-augmented language model approach for solving the knowledge graph question answering (KGQA) task.
We propose an answer-sensitive KG-to-Text approach that can transform KG knowledge into well-textualized statements.
arXiv Detail & Related papers (2023-09-20T10:42:08Z) - Think-on-Graph: Deep and Responsible Reasoning of Large Language Model on Knowledge Graph [29.447300472617826]
Think-on-Graph (ToG) is a new approach for external knowledge graphs (KG) in large language models (LLMs)
ToG iteratively executes beam search on KG, discovers the most promising reasoning paths, and returns the most likely reasoning results.
ToG achieves overall SOTA in 6 out of 9 datasets where most previous SOTAs rely on additional training.
arXiv Detail & Related papers (2023-07-15T03:31:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.