Can We Catch the Elephant? A Survey of the Evolvement of Hallucination Evaluation on Natural Language Generation
- URL: http://arxiv.org/abs/2404.12041v2
- Date: Sat, 15 Jun 2024 22:57:20 GMT
- Title: Can We Catch the Elephant? A Survey of the Evolvement of Hallucination Evaluation on Natural Language Generation
- Authors: Siya Qi, Yulan He, Zheng Yuan,
- Abstract summary: The evaluation system for hallucination is complex and diverse, lacking clear organization.
This survey aims to help researchers identify current limitations in hallucination evaluation and highlight future research directions.
- Score: 15.67906403625006
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Hallucination in Natural Language Generation (NLG) is like the elephant in the room, obvious but often overlooked until recent achievements significantly improved the fluency and grammaticality of generated text. As the capabilities of text generation models have improved, researchers have begun to pay more attention to the phenomenon of hallucination. Despite significant progress in this field in recent years, the evaluation system for hallucination is complex and diverse, lacking clear organization. We are the first to comprehensively survey how various evaluation methods have evolved with the development of text generation models from three dimensions, including hallucinated fact granularity, evaluator design principles, and assessment facets. This survey aims to help researchers identify current limitations in hallucination evaluation and highlight future research directions.
Related papers
- H-POPE: Hierarchical Polling-based Probing Evaluation of Hallucinations in Large Vision-Language Models [0.0]
We propose H-POPE, a coarse-to-fine-grained benchmark that assesses hallucinations in object existence and attributes.
Our evaluation shows that models are prone to hallucinations on object existence, and even more so on fine-grained attributes.
arXiv Detail & Related papers (2024-11-06T17:55:37Z) - The Hallucinations Leaderboard -- An Open Effort to Measure Hallucinations in Large Language Models [24.11077502209129]
Large Language Models (LLMs) have transformed the Natural Language Processing (NLP) landscape with their remarkable ability to understand and generate human-like text.
However, these models are prone to hallucinations'' -- outputs that do not align with factual reality or the input context.
This paper introduces the Hallucinations Leaderboard, an open initiative to quantitatively measure and compare the tendency of each model to produce hallucinations.
arXiv Detail & Related papers (2024-04-08T23:16:22Z) - Fine-grained Hallucination Detection and Editing for Language Models [109.56911670376932]
Large language models (LMs) are prone to generate factual errors, which are often called hallucinations.
We introduce a comprehensive taxonomy of hallucinations and argue that hallucinations manifest in diverse forms.
We propose a novel task of automatic fine-grained hallucination detection and construct a new evaluation benchmark, FavaBench.
arXiv Detail & Related papers (2024-01-12T19:02:48Z) - Alleviating Hallucinations of Large Language Models through Induced
Hallucinations [67.35512483340837]
Large language models (LLMs) have been observed to generate responses that include inaccurate or fabricated information.
We propose a simple textitInduce-then-Contrast Decoding (ICD) strategy to alleviate hallucinations.
arXiv Detail & Related papers (2023-12-25T12:32:49Z) - On Early Detection of Hallucinations in Factual Question Answering [4.76359068115052]
hallucinations remain a major impediment towards gaining user trust.
In this work, we explore if the artifacts associated with the model generations can provide hints that the generation will contain hallucinations.
Our results show that the distributions of these artifacts tend to differ between hallucinated and non-hallucinated generations.
arXiv Detail & Related papers (2023-12-19T14:35:04Z) - Towards Mitigating Hallucination in Large Language Models via
Self-Reflection [63.2543947174318]
Large language models (LLMs) have shown promise for generative and knowledge-intensive tasks including question-answering (QA) tasks.
This paper analyses the phenomenon of hallucination in medical generative QA systems using widely adopted LLMs and datasets.
arXiv Detail & Related papers (2023-10-10T03:05:44Z) - AutoHall: Automated Hallucination Dataset Generation for Large Language Models [56.92068213969036]
This paper introduces a method for automatically constructing model-specific hallucination datasets based on existing fact-checking datasets called AutoHall.
We also propose a zero-resource and black-box hallucination detection method based on self-contradiction.
arXiv Detail & Related papers (2023-09-30T05:20:02Z) - Cognitive Mirage: A Review of Hallucinations in Large Language Models [10.86850565303067]
We present a novel taxonomy of hallucinations from various text generation tasks.
We provide theoretical insights, detection methods and improvement approaches.
As hallucinations garner significant attention, we will maintain updates on relevant research progress.
arXiv Detail & Related papers (2023-09-13T08:33:09Z) - Plausible May Not Be Faithful: Probing Object Hallucination in
Vision-Language Pre-training [66.0036211069513]
Large-scale vision-language pre-trained models are prone to hallucinate non-existent visual objects when generating text.
We show that models achieving better scores on standard metrics could hallucinate objects more frequently.
Surprisingly, we find that patch-based features perform the best and smaller patch resolution yields a non-trivial reduction in object hallucination.
arXiv Detail & Related papers (2022-10-14T10:27:22Z) - Survey of Hallucination in Natural Language Generation [69.9926849848132]
Natural Language Generation (NLG) has improved exponentially in recent years thanks to the development of sequence-to-sequence deep learning technologies.
Deep learning based generation is prone to hallucinate unintended text, which degrades the system performance.
This survey serves to facilitate collaborative efforts among researchers in tackling the challenge of hallucinated texts in NLG.
arXiv Detail & Related papers (2022-02-08T03:55:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.