Systematic Assessment of Factual Knowledge in Large Language Models
- URL: http://arxiv.org/abs/2310.11638v3
- Date: Mon, 30 Oct 2023 12:05:31 GMT
- Title: Systematic Assessment of Factual Knowledge in Large Language Models
- Authors: Linhao Luo, Thuy-Trang Vu, Dinh Phung, Gholamreza Haffari
- Abstract summary: This paper proposes a framework to assess the factual knowledge of large language models (LLMs) by leveraging knowledge graphs (KGs)
Our framework automatically generates a set of questions and expected answers from the facts stored in a given KG, and then evaluates the accuracy of LLMs in answering these questions.
- Score: 48.75961313441549
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Previous studies have relied on existing question-answering benchmarks to
evaluate the knowledge stored in large language models (LLMs). However, this
approach has limitations regarding factual knowledge coverage, as it mostly
focuses on generic domains which may overlap with the pretraining data. This
paper proposes a framework to systematically assess the factual knowledge of
LLMs by leveraging knowledge graphs (KGs). Our framework automatically
generates a set of questions and expected answers from the facts stored in a
given KG, and then evaluates the accuracy of LLMs in answering these questions.
We systematically evaluate the state-of-the-art LLMs with KGs in generic and
specific domains. The experiment shows that ChatGPT is consistently the top
performer across all domains. We also find that LLMs performance depends on the
instruction finetuning, domain and question complexity and is prone to
adversarial context.
Related papers
- Domain-Specific Retrieval-Augmented Generation Using Vector Stores, Knowledge Graphs, and Tensor Factorization [7.522493227357079]
Large Language Models (LLMs) are pre-trained on large-scale corpora.
LLMs suffer from hallucinations, knowledge cut-offs, and lack of knowledge attributions.
We introduce SMART-SLIC, a highly domain-specific LLM framework.
arXiv Detail & Related papers (2024-10-03T17:40:55Z) - DomainRAG: A Chinese Benchmark for Evaluating Domain-specific Retrieval-Augmented Generation [19.907074685082]
Retrieval-Augmented Generation offers a promising solution to address various limitations of Large Language Models.
Current studies often rely on general knowledge sources like Wikipedia to assess the models' abilities in solving common-sense problems.
We identified six required abilities for RAG models, including the ability in conversational RAG.
arXiv Detail & Related papers (2024-06-09T05:33:51Z) - Prompting Large Language Models with Knowledge Graphs for Question Answering Involving Long-tail Facts [50.06633829833144]
Large Language Models (LLMs) are effective in performing various NLP tasks, but struggle to handle tasks that require extensive, real-world knowledge.
We propose a benchmark that requires knowledge of long-tail facts for answering the involved questions.
Our experiments show that LLMs alone struggle with answering these questions, especially when the long-tail level is high or rich knowledge is required.
arXiv Detail & Related papers (2024-05-10T15:10:20Z) - Generate-on-Graph: Treat LLM as both Agent and KG in Incomplete Knowledge Graph Question Answering [87.67177556994525]
We propose a training-free method called Generate-on-Graph (GoG) to generate new factual triples while exploring Knowledge Graphs (KGs)
GoG performs reasoning through a Thinking-Searching-Generating framework, which treats LLM as both Agent and KG in IKGQA.
arXiv Detail & Related papers (2024-04-23T04:47:22Z) - Context Matters: Pushing the Boundaries of Open-Ended Answer Generation with Graph-Structured Knowledge Context [4.1229332722825]
This paper introduces a novel framework that combines graph-driven context retrieval in conjunction to knowledge graphs based enhancement.
We conduct experiments on various Large Language Models (LLMs) with different parameter sizes to evaluate their ability to ground knowledge and determine factual accuracy in answers to open-ended questions.
Our methodology GraphContextGen consistently outperforms dominant text-based retrieval systems, demonstrating its robustness and adaptability to a larger number of use cases.
arXiv Detail & Related papers (2024-01-23T11:25:34Z) - Knowledge Plugins: Enhancing Large Language Models for Domain-Specific
Recommendations [50.81844184210381]
We propose a general paradigm that augments large language models with DOmain-specific KnowledgE to enhance their performance on practical applications, namely DOKE.
This paradigm relies on a domain knowledge extractor, working in three steps: 1) preparing effective knowledge for the task; 2) selecting the knowledge for each specific sample; and 3) expressing the knowledge in an LLM-understandable way.
arXiv Detail & Related papers (2023-11-16T07:09:38Z) - Investigating the Factual Knowledge Boundary of Large Language Models with Retrieval Augmentation [109.8527403904657]
We show that large language models (LLMs) possess unwavering confidence in their knowledge and cannot handle the conflict between internal and external knowledge well.
Retrieval augmentation proves to be an effective approach in enhancing LLMs' awareness of knowledge boundaries.
We propose a simple method to dynamically utilize supporting documents with our judgement strategy.
arXiv Detail & Related papers (2023-07-20T16:46:10Z) - KoLA: Carefully Benchmarking World Knowledge of Large Language Models [87.96683299084788]
We construct a Knowledge-oriented LLM Assessment benchmark (KoLA)
We mimic human cognition to form a four-level taxonomy of knowledge-related abilities, covering $19$ tasks.
We use both Wikipedia, a corpus prevalently pre-trained by LLMs, along with continuously collected emerging corpora, to evaluate the capacity to handle unseen data and evolving knowledge.
arXiv Detail & Related papers (2023-06-15T17:20:46Z) - Knowledge-Augmented Language Model Prompting for Zero-Shot Knowledge
Graph Question Answering [7.888547093390469]
Large Language Models (LLMs) are capable of performing zero-shot closed-book question answering tasks.
We propose to augment the knowledge directly in the input of LLMs.
Our framework, Knowledge-Augmented language model PromptING (KAPING), requires no model training, thus completely zero-shot.
arXiv Detail & Related papers (2023-06-07T04:15:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.