The Perils & Promises of Fact-checking with Large Language Models
- URL: http://arxiv.org/abs/2310.13549v2
- Date: Wed, 7 Feb 2024 12:01:49 GMT
- Title: The Perils & Promises of Fact-checking with Large Language Models
- Authors: Dorian Quelle, Alexandre Bovet
- Abstract summary: Large Language Models (LLMs) are increasingly trusted to write academic papers, lawsuits, and news articles.
We evaluate the use of LLM agents in fact-checking by having them phrase queries, retrieve contextual data, and make decisions.
Our results show the enhanced prowess of LLMs when equipped with contextual information.
While LLMs show promise in fact-checking, caution is essential due to inconsistent accuracy.
- Score: 55.869584426820715
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Automated fact-checking, using machine learning to verify claims, has grown
vital as misinformation spreads beyond human fact-checking capacity. Large
Language Models (LLMs) like GPT-4 are increasingly trusted to write academic
papers, lawsuits, and news articles and to verify information, emphasizing
their role in discerning truth from falsehood and the importance of being able
to verify their outputs. Understanding the capacities and limitations of LLMs
in fact-checking tasks is therefore essential for ensuring the health of our
information ecosystem. Here, we evaluate the use of LLM agents in fact-checking
by having them phrase queries, retrieve contextual data, and make decisions.
Importantly, in our framework, agents explain their reasoning and cite the
relevant sources from the retrieved context. Our results show the enhanced
prowess of LLMs when equipped with contextual information. GPT-4 outperforms
GPT-3, but accuracy varies based on query language and claim veracity. While
LLMs show promise in fact-checking, caution is essential due to inconsistent
accuracy. Our investigation calls for further research, fostering a deeper
comprehension of when agents succeed and when they fail.
Related papers
- To Know or Not To Know? Analyzing Self-Consistency of Large Language Models under Ambiguity [27.10502683001428]
This paper focuses on entity type ambiguity, analyzing the proficiency and consistency of state-of-the-art LLMs in applying factual knowledge when prompted with ambiguous entities.
Experiments reveal that LLMs struggle with choosing the correct entity reading, achieving an average accuracy of only 85%, and as low as 75% with underspecified prompts.
arXiv Detail & Related papers (2024-07-24T09:48:48Z) - Unsupervised Information Refinement Training of Large Language Models for Retrieval-Augmented Generation [128.01050030936028]
We propose an information refinement training method named InFO-RAG.
InFO-RAG is low-cost and general across various tasks.
It improves the performance of LLaMA2 by an average of 9.39% relative points.
arXiv Detail & Related papers (2024-02-28T08:24:38Z) - UFO: a Unified and Flexible Framework for Evaluating Factuality of Large
Language Models [73.73303148524398]
Large language models (LLMs) may generate text that lacks consistency with human knowledge, leading to factual inaccuracies or textithallucination.
We propose textttUFO, an LLM-based unified and flexible evaluation framework to verify facts against plug-and-play fact sources.
arXiv Detail & Related papers (2024-02-22T16:45:32Z) - Can LLMs Produce Faithful Explanations For Fact-checking? Towards
Faithful Explainable Fact-Checking via Multi-Agent Debate [75.10515686215177]
Large Language Models (LLMs) excel in text generation, but their capability for producing faithful explanations in fact-checking remains underexamined.
We propose the Multi-Agent Debate Refinement (MADR) framework, leveraging multiple LLMs as agents with diverse roles.
MADR ensures that the final explanation undergoes rigorous validation, significantly reducing the likelihood of unfaithful elements and aligning closely with the provided evidence.
arXiv Detail & Related papers (2024-02-12T04:32:33Z) - Language Models Hallucinate, but May Excel at Fact Verification [89.0833981569957]
Large language models (LLMs) frequently "hallucinate," resulting in non-factual outputs.
Even GPT-3.5 produces factual outputs less than 25% of the time.
This underscores the importance of fact verifiers in order to measure and incentivize progress.
arXiv Detail & Related papers (2023-10-23T04:39:01Z) - Large Language Models Help Humans Verify Truthfulness -- Except When They Are Convincingly Wrong [35.64962031447787]
Large Language Models (LLMs) are increasingly used for accessing information on the web.
Our experiments with 80 crowdworkers compare language models with search engines (information retrieval systems) at facilitating fact-checking.
Users reading LLM explanations are significantly more efficient than those using search engines while achieving similar accuracy.
arXiv Detail & Related papers (2023-10-19T08:09:58Z) - FELM: Benchmarking Factuality Evaluation of Large Language Models [40.78878196872095]
We introduce a benchmark for Factuality Evaluation of large Language Models, referred to as felm.
We collect responses generated from large language models and annotate factuality labels in a fine-grained manner.
Our findings reveal that while retrieval aids factuality evaluation, current LLMs are far from satisfactory to faithfully detect factual errors.
arXiv Detail & Related papers (2023-10-01T17:37:31Z) - FactLLaMA: Optimizing Instruction-Following Language Models with
External Knowledge for Automated Fact-Checking [10.046323978189847]
We propose combining the power of instruction-following language models with external evidence retrieval to enhance fact-checking performance.
Our approach involves leveraging search engines to retrieve relevant evidence for a given input claim.
Then, we instruct-tune an open-sourced language model, called LLaMA, using this evidence, enabling it to predict the veracity of the input claim more accurately.
arXiv Detail & Related papers (2023-09-01T04:14:39Z) - Self-Checker: Plug-and-Play Modules for Fact-Checking with Large Language Models [75.75038268227554]
Self-Checker is a framework comprising a set of plug-and-play modules that facilitate fact-checking.
This framework provides a fast and efficient way to construct fact-checking systems in low-resource environments.
arXiv Detail & Related papers (2023-05-24T01:46:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.