Are Large Language Models Good Fact Checkers: A Preliminary Study
- URL: http://arxiv.org/abs/2311.17355v1
- Date: Wed, 29 Nov 2023 05:04:52 GMT
- Title: Are Large Language Models Good Fact Checkers: A Preliminary Study
- Authors: Han Cao, Lingwei Wei, Mengyang Chen, Wei Zhou, Songlin Hu
- Abstract summary: Large Language Models (LLMs) have drawn significant attention due to their outstanding reasoning capabilities and extensive knowledge repository.
This study aims to comprehensively evaluate various LLMs in tackling specific fact-checking subtasks.
- Score: 26.023148371263012
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, Large Language Models (LLMs) have drawn significant attention due
to their outstanding reasoning capabilities and extensive knowledge repository,
positioning them as superior in handling various natural language processing
tasks compared to other language models. In this paper, we present a
preliminary investigation into the potential of LLMs in fact-checking. This
study aims to comprehensively evaluate various LLMs in tackling specific
fact-checking subtasks, systematically evaluating their capabilities, and
conducting a comparative analysis of their performance against pre-trained and
state-of-the-art low-parameter models. Experiments demonstrate that LLMs
achieve competitive performance compared to other small models in most
scenarios. However, they encounter challenges in effectively handling Chinese
fact verification and the entirety of the fact-checking pipeline due to
language inconsistencies and hallucinations. These findings underscore the need
for further exploration and research to enhance the proficiency of LLMs as
reliable fact-checkers, unveiling the potential capability of LLMs and the
possible challenges in fact-checking tasks.
Related papers
- Evaluating Linguistic Capabilities of Multimodal LLMs in the Lens of Few-Shot Learning [15.919493497867567]
This study aims to evaluate the performance of Multimodal Large Language Models (MLLMs) on the VALSE benchmark.
We conducted a comprehensive assessment of state-of-the-art MLLMs, varying in model size and pretraining datasets.
arXiv Detail & Related papers (2024-07-17T11:26:47Z) - Analyzing and Adapting Large Language Models for Few-Shot Multilingual
NLU: Are We There Yet? [82.02076369811402]
Supervised fine-tuning (SFT), supervised instruction tuning (SIT) and in-context learning (ICL) are three alternative, de facto standard approaches to few-shot learning.
We present an extensive and systematic comparison of the three approaches, testing them on 6 high- and low-resource languages, three different NLU tasks, and a myriad of language and domain setups.
Our observations show that supervised instruction tuning has the best trade-off between performance and resource requirements.
arXiv Detail & Related papers (2024-03-04T10:48:13Z) - Linguistic Intelligence in Large Language Models for Telecommunications [5.06945923921948]
Large Language Models (LLMs) have emerged as a significant advancement in the field of Natural Language Processing (NLP)
This study seeks to evaluate the knowledge and understanding capabilities of LLMs within the telecommunications domain.
Our evaluation reveals that zero-shot LLMs can achieve performance levels comparable to the current state-of-the-art fine-tuned models.
arXiv Detail & Related papers (2024-02-24T14:01:07Z) - Supervised Knowledge Makes Large Language Models Better In-context Learners [94.89301696512776]
Large Language Models (LLMs) exhibit emerging in-context learning abilities through prompt engineering.
The challenge of improving the generalizability and factuality of LLMs in natural language understanding and question answering remains under-explored.
We propose a framework that enhances the reliability of LLMs as it: 1) generalizes out-of-distribution data, 2) elucidates how LLMs benefit from discriminative models, and 3) minimizes hallucinations in generative tasks.
arXiv Detail & Related papers (2023-12-26T07:24:46Z) - Survey on Factuality in Large Language Models: Knowledge, Retrieval and
Domain-Specificity [61.54815512469125]
This survey addresses the crucial issue of factuality in Large Language Models (LLMs)
As LLMs find applications across diverse domains, the reliability and accuracy of their outputs become vital.
arXiv Detail & Related papers (2023-10-11T14:18:03Z) - Are Large Language Models Really Robust to Word-Level Perturbations? [68.60618778027694]
We propose a novel rational evaluation approach that leverages pre-trained reward models as diagnostic tools.
Longer conversations manifest the comprehensive grasp of language models in terms of their proficiency in understanding questions.
Our results demonstrate that LLMs frequently exhibit vulnerability to word-level perturbations that are commonplace in daily language usage.
arXiv Detail & Related papers (2023-09-20T09:23:46Z) - Through the Lens of Core Competency: Survey on Evaluation of Large
Language Models [27.271533306818732]
Large language model (LLM) has excellent performance and wide practical uses.
Existing evaluation tasks are difficult to keep up with the wide range of applications in real-world scenarios.
We summarize 4 core competencies of LLM, including reasoning, knowledge, reliability, and safety.
Under this competency architecture, similar tasks are combined to reflect corresponding ability, while new tasks can also be easily added into the system.
arXiv Detail & Related papers (2023-08-15T17:40:34Z) - Evaluating the Capability of Large-scale Language Models on Chinese
Grammatical Error Correction Task [10.597024796304016]
Large-scale language models (LLMs) has shown remarkable capability in various of Natural Language Processing (NLP) tasks.
This report explores the how large language models perform on Chinese grammatical error correction tasks.
arXiv Detail & Related papers (2023-07-08T13:10:59Z) - CMMLU: Measuring massive multitask language understanding in Chinese [133.70911295934746]
This paper introduces a comprehensive Chinese benchmark that covers various subjects, including natural science, social sciences, engineering, and humanities.
CMMLU fills the gap in evaluating the knowledge and reasoning capabilities of large language models within the Chinese context.
arXiv Detail & Related papers (2023-06-15T15:49:51Z) - Large Language Models Are Not Strong Abstract Reasoners [12.354660792999269]
Large Language Models have shown tremendous performance on a variety of natural language processing tasks.
It is unclear whether LLMs can achieve human-like cognitive capabilities or whether these models are still fundamentally circumscribed.
We introduce a new benchmark for evaluating language models beyond memorization on abstract reasoning tasks.
arXiv Detail & Related papers (2023-05-31T04:50:29Z) - Self-Checker: Plug-and-Play Modules for Fact-Checking with Large Language Models [75.75038268227554]
Self-Checker is a framework comprising a set of plug-and-play modules that facilitate fact-checking.
This framework provides a fast and efficient way to construct fact-checking systems in low-resource environments.
arXiv Detail & Related papers (2023-05-24T01:46:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.