Investigating the Factual Knowledge Boundary of Large Language Models
with Retrieval Augmentation
- URL: http://arxiv.org/abs/2307.11019v2
- Date: Sun, 23 Jul 2023 16:52:59 GMT
- Title: Investigating the Factual Knowledge Boundary of Large Language Models
with Retrieval Augmentation
- Authors: Ruiyang Ren, Yuhao Wang, Yingqi Qu, Wayne Xin Zhao, Jing Liu, Hao
Tian, Hua Wu, Ji-Rong Wen, Haifeng Wang
- Abstract summary: We show that large language models (LLMs) possess unwavering confidence in their capabilities to respond to questions.
Retrieval augmentation proves to be an effective approach in enhancing LLMs' awareness of knowledge boundaries.
We also find that LLMs have a propensity to rely on the provided retrieval results when formulating answers.
- Score: 91.30946119104111
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Knowledge-intensive tasks (e.g., open-domain question answering (QA)) require
a substantial amount of factual knowledge and often rely on external
information for assistance. Recently, large language models (LLMs) (e.g.,
ChatGPT), have demonstrated impressive prowess in solving a wide range of tasks
with world knowledge, including knowledge-intensive tasks. However, it remains
unclear how well LLMs are able to perceive their factual knowledge boundaries,
particularly how they behave when incorporating retrieval augmentation. In this
study, we present an initial analysis of the factual knowledge boundaries of
LLMs and how retrieval augmentation affects LLMs on open-domain QA. Specially,
we focus on three primary research questions and analyze them by examining QA
performance, priori judgement and posteriori judgement of LLMs. We show
evidence that LLMs possess unwavering confidence in their capabilities to
respond to questions and the accuracy of their responses. Furthermore,
retrieval augmentation proves to be an effective approach in enhancing LLMs'
awareness of knowledge boundaries, thereby improving their judgemental
abilities. Additionally, we also find that LLMs have a propensity to rely on
the provided retrieval results when formulating answers, while the quality of
these results significantly impacts their reliance. The code to reproduce this
work is available at https://github.com/RUCAIBox/LLM-Knowledge-Boundary.
Related papers
- Prompting Large Language Models with Knowledge Graphs for Question Answering Involving Long-tail Facts [50.06633829833144]
Large Language Models (LLMs) are effective in performing various NLP tasks, but struggle to handle tasks that require extensive, real-world knowledge.
We propose a benchmark that requires knowledge of long-tail facts for answering the involved questions.
Our experiments show that LLMs alone struggle with answering these questions, especially when the long-tail level is high or rich knowledge is required.
arXiv Detail & Related papers (2024-05-10T15:10:20Z) - Small Models, Big Insights: Leveraging Slim Proxy Models To Decide When and What to Retrieve for LLMs [60.40396361115776]
This paper introduces a novel collaborative approach, namely SlimPLM, that detects missing knowledge in large language models (LLMs) with a slim proxy model.
We employ a proxy model which has far fewer parameters, and take its answers as answers.
Heuristic answers are then utilized to predict the knowledge required to answer the user question, as well as the known and unknown knowledge within the LLM.
arXiv Detail & Related papers (2024-02-19T11:11:08Z) - When Do LLMs Need Retrieval Augmentation? Mitigating LLMs' Overconfidence Helps Retrieval Augmentation [66.01754585188739]
Large Language Models (LLMs) have been found to have difficulty knowing they do not possess certain knowledge.
Retrieval Augmentation (RA) has been extensively studied to mitigate LLMs' hallucinations.
We propose several methods to enhance LLMs' perception of knowledge boundaries and show that they are effective in reducing overconfidence.
arXiv Detail & Related papers (2024-02-18T04:57:19Z) - Don't Hallucinate, Abstain: Identifying LLM Knowledge Gaps via Multi-LLM Collaboration [39.603649838876294]
We study approaches to identify LLM knowledge gaps and abstain from answering questions when knowledge gaps are present.
Motivated by their failures in self-reflection and over-reliance on held-out sets, we propose two novel approaches.
arXiv Detail & Related papers (2024-02-01T06:11:49Z) - KnowledgeNavigator: Leveraging Large Language Models for Enhanced
Reasoning over Knowledge Graph [11.808990571175269]
Large language model (LLM) has achieved outstanding performance on various downstream tasks with its powerful natural language understanding and zero-shot capability, but LLM still suffers from knowledge limitation.
We propose a novel framework KnowledgeNavigator to address these challenges by efficiently and accurately retrieving external knowledge from knowledge graph.
We evaluate KnowledgeNavigator on multiple public KGQA benchmarks, the experiments show the framework has great effectiveness and generalization.
arXiv Detail & Related papers (2023-12-26T04:22:56Z) - RECALL: A Benchmark for LLMs Robustness against External Counterfactual
Knowledge [69.79676144482792]
This study aims to evaluate the ability of LLMs to distinguish reliable information from external knowledge.
Our benchmark consists of two tasks, Question Answering and Text Generation, and for each task, we provide models with a context containing counterfactual information.
arXiv Detail & Related papers (2023-11-14T13:24:19Z) - Self-Knowledge Guided Retrieval Augmentation for Large Language Models [59.771098292611846]
Large language models (LLMs) have shown superior performance without task-specific fine-tuning.
Retrieval-based methods can offer non-parametric world knowledge and improve the performance on tasks such as question answering.
Self-Knowledge guided Retrieval augmentation (SKR) is a simple yet effective method which can let LLMs refer to the questions they have previously encountered.
arXiv Detail & Related papers (2023-10-08T04:22:33Z) - When Giant Language Brains Just Aren't Enough! Domain Pizzazz with
Knowledge Sparkle Dust [15.484175299150904]
This paper presents an empirical analysis aimed at bridging the gap in adapting large language models to practical use cases.
We select the question answering (QA) task of insurance as a case study due to its challenge of reasoning.
Based on the task we design a new model relied on LLMs which are empowered by additional knowledge extracted from insurance policy rulebooks and DBPedia.
arXiv Detail & Related papers (2023-05-12T03:49:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.