KG-FPQ: Evaluating Factuality Hallucination in LLMs with Knowledge Graph-based False Premise Questions
- URL: http://arxiv.org/abs/2407.05868v1
- Date: Mon, 8 Jul 2024 12:31:03 GMT
- Title: KG-FPQ: Evaluating Factuality Hallucination in LLMs with Knowledge Graph-based False Premise Questions
- Authors: Yanxu Zhu, Jinlin Xiao, Yuhang Wang, Jitao Sang,
- Abstract summary: Large language models (LLMs) are susceptible to being misled by false premise questions (FPQs)
We introduce an automated, scalable pipeline to create FPQs based on knowledge graphs (KGs)
We present a benchmark, the Knowledge Graph-based False Premise Questions (KG-FPQ), which contains approximately 178k FPQs across three knowledge domains, at six levels of confusability, and in two task formats.
- Score: 19.246385485678104
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent studies have demonstrated that large language models (LLMs) are susceptible to being misled by false premise questions (FPQs), leading to errors in factual knowledge, know as factuality hallucination. Existing benchmarks that assess this vulnerability primarily rely on manual construction, resulting in limited scale and lack of scalability. In this work, we introduce an automated, scalable pipeline to create FPQs based on knowledge graphs (KGs). The first step is modifying true triplets extracted from KGs to create false premises. Subsequently, utilizing the state-of-the-art capabilities of GPTs, we generate semantically rich FPQs. Based on the proposed method, we present a comprehensive benchmark, the Knowledge Graph-based False Premise Questions (KG-FPQ), which contains approximately 178k FPQs across three knowledge domains, at six levels of confusability, and in two task formats. Using KG-FPQ, we conduct extensive evaluations on several representative LLMs and provide valuable insights. The KG-FPQ dataset and code are available at~https://github.com/yanxuzhu/KG-FPQ.
Related papers
- LinkQ: An LLM-Assisted Visual Interface for Knowledge Graph Question-Answering [1.5238808518078564]
LinkQ is a system that leverages a large language model (LLM) to facilitate knowledge graph (KG) query construction through natural language question-answering.
Our results indicate that practitioners find LinkQ effective for KG question-answering.
arXiv Detail & Related papers (2024-06-07T15:28:31Z) - Generate-on-Graph: Treat LLM as both Agent and KG in Incomplete Knowledge Graph Question Answering [90.30473970040362]
We propose a training-free method called Generate-on-Graph (GoG) that can generate new factual triples while exploring on Knowledge Graphs (KGs)
Specifically, we propose a selecting-generating-answering framework, which not only treat the LLM as an agent to explore on KGs, but also treat it as a KG to generate new facts based on the explored subgraph.
arXiv Detail & Related papers (2024-04-23T04:47:22Z) - CR-LT-KGQA: A Knowledge Graph Question Answering Dataset Requiring
Commonsense Reasoning and Long-Tail Knowledge [21.73770363188049]
We create a novel Commonsense Reasoning (CR) and Long-Tail (LT) KGQA dataset with two subtasks -- question answering and claim verification.
While existing KGQA methods are not applicable due to their lack of commonsense inference support, baseline evaluation of LLMs on CR-LT KGQA demonstrate a high rate of hallucination.
arXiv Detail & Related papers (2024-03-03T04:47:01Z) - ReasoningLM: Enabling Structural Subgraph Reasoning in Pre-trained
Language Models for Question Answering over Knowledge Graph [142.42275983201978]
We propose a subgraph-aware self-attention mechanism to imitate the GNN for performing structured reasoning.
We also adopt an adaptation tuning strategy to adapt the model parameters with 20,000 subgraphs with synthesized questions.
Experiments show that ReasoningLM surpasses state-of-the-art models by a large margin, even with fewer updated parameters and less training data.
arXiv Detail & Related papers (2023-12-30T07:18:54Z) - Mitigating Large Language Model Hallucinations via Autonomous Knowledge
Graph-based Retrofitting [51.7049140329611]
This paper proposes Knowledge Graph-based Retrofitting (KGR) to mitigate factual hallucination during the reasoning process.
Experiments show that KGR can significantly improve the performance of LLMs on factual QA benchmarks.
arXiv Detail & Related papers (2023-11-22T11:08:38Z) - Systematic Assessment of Factual Knowledge in Large Language Models [48.75961313441549]
This paper proposes a framework to assess the factual knowledge of large language models (LLMs) by leveraging knowledge graphs (KGs)
Our framework automatically generates a set of questions and expected answers from the facts stored in a given KG, and then evaluates the accuracy of LLMs in answering these questions.
arXiv Detail & Related papers (2023-10-18T00:20:50Z) - Reasoning on Graphs: Faithful and Interpretable Large Language Model
Reasoning [104.92384929827776]
Large language models (LLMs) have demonstrated impressive reasoning abilities in complex tasks.
They lack up-to-date knowledge and experience hallucinations during reasoning.
Knowledge graphs (KGs) offer a reliable source of knowledge for reasoning.
arXiv Detail & Related papers (2023-10-02T10:14:43Z) - Won't Get Fooled Again: Answering Questions with False Premises [79.8761549830075]
Pre-trained language models (PLMs) have shown unprecedented potential in various fields.
PLMs tend to be easily deceived by tricky questions such as "How many eyes does the sun have?"
We find that the PLMs already possess the knowledge required to rebut such questions.
arXiv Detail & Related papers (2023-07-05T16:09:21Z) - An Empirical Study of Pre-trained Language Models in Simple Knowledge
Graph Question Answering [28.31377197194905]
Large-scale pre-trained language models (PLMs) have recently achieved great success and become a milestone in natural language processing (NLP)
In recent works on knowledge graph question answering (KGQA), BERT or its variants have become necessary in their KGQA models.
We compare the performance of different PLMs in KGQA and present three benchmarks for larger-scale KGs.
arXiv Detail & Related papers (2023-03-18T08:57:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.