AI Hallucinations: A Misnomer Worth Clarifying
- URL: http://arxiv.org/abs/2401.06796v1
- Date: Tue, 9 Jan 2024 01:49:41 GMT
- Title: AI Hallucinations: A Misnomer Worth Clarifying
- Authors: Negar Maleki, Balaji Padmanabhan, Kaushik Dutta
- Abstract summary: We present and analyze definitions obtained across all databases, categorize them based on their applications, and extract key points within each category.
Our results highlight a lack of consistency in how the term is used, but also help identify several alternative terms in the literature.
- Score: 4.880243880711163
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As large language models continue to advance in Artificial Intelligence (AI),
text generation systems have been shown to suffer from a problematic phenomenon
termed often as "hallucination." However, with AI's increasing presence across
various domains including medicine, concerns have arisen regarding the use of
the term itself. In this study, we conducted a systematic review to identify
papers defining "AI hallucination" across fourteen databases. We present and
analyze definitions obtained across all databases, categorize them based on
their applications, and extract key points within each category. Our results
highlight a lack of consistency in how the term is used, but also help identify
several alternative terms in the literature. We discuss implications of these
and call for a more unified effort to bring consistency to an important
contemporary AI issue that can affect multiple domains significantly.
Related papers
- Reefknot: A Comprehensive Benchmark for Relation Hallucination Evaluation, Analysis and Mitigation in Multimodal Large Language Models [13.48296910438554]
Hallucination issues persistently plagued current multimodal large language models (MLLMs)
We introduce Reefknot, a benchmark specifically targeting relation hallucinations, consisting of over 20,000 samples derived from real-world scenarios.
Our comparative evaluation across three distinct tasks revealed a substantial shortcoming in the capabilities of current MLLMs to mitigate relation hallucinations.
arXiv Detail & Related papers (2024-08-18T10:07:02Z) - Data Set Terminology of Deep Learning in Medicine: A Historical Review and Recommendation [0.7897552065199818]
Medicine and deep learning-based artificial intelligence engineering represent two distinct fields each with decades of published history.
With such history comes a set of terminology that has a specific way in which it is applied.
This review aims to give historical context for these terms, accentuate the importance of clarity when these terms are used in medical AI contexts.
arXiv Detail & Related papers (2024-04-30T07:07:45Z) - Comparing Hallucination Detection Metrics for Multilingual Generation [62.97224994631494]
This paper assesses how well various factual hallucination detection metrics identify hallucinations in generated biographical summaries across languages.
We compare how well automatic metrics correlate to each other and whether they agree with human judgments of factuality.
Our analysis reveals that while the lexical metrics are ineffective, NLI-based metrics perform well, correlating with human annotations in many settings and often outperforming supervised models.
arXiv Detail & Related papers (2024-02-16T08:10:34Z) - HALO: An Ontology for Representing and Categorizing Hallucinations in Large Language Models [2.9312156642007294]
Hallucination Ontology (HALO) is written in OWL and supports six different types of hallucinations known to arise in large language models (LLMs)
We publish a dataset containing hallucinations that we inductively gathered across multiple independent Web sources, and show that HALO can be successfully used to model this dataset and answer competency questions.
arXiv Detail & Related papers (2023-12-08T17:57:20Z) - Representing visual classification as a linear combination of words [0.0]
We present an explainability strategy that uses a vision-language model to identify language-based descriptors of a visual classification task.
By leveraging a pre-trained joint embedding space between images and text, our approach estimates a new classification task as a linear combination of words.
We find that the resulting descriptors largely align with clinical knowledge despite a lack of domain-specific language training.
arXiv Detail & Related papers (2023-11-18T02:00:20Z) - Insights into Classifying and Mitigating LLMs' Hallucinations [48.04565928175536]
This paper delves into the underlying causes of AI hallucination and elucidates its significance in artificial intelligence.
We explore potential strategies to mitigate hallucinations, aiming to enhance the overall reliability of large language models.
arXiv Detail & Related papers (2023-11-14T12:30:28Z) - Towards Mitigating Hallucination in Large Language Models via
Self-Reflection [63.2543947174318]
Large language models (LLMs) have shown promise for generative and knowledge-intensive tasks including question-answering (QA) tasks.
This paper analyses the phenomenon of hallucination in medical generative QA systems using widely adopted LLMs and datasets.
arXiv Detail & Related papers (2023-10-10T03:05:44Z) - AutoHall: Automated Hallucination Dataset Generation for Large Language Models [56.92068213969036]
This paper introduces a method for automatically constructing model-specific hallucination datasets based on existing fact-checking datasets called AutoHall.
We also propose a zero-resource and black-box hallucination detection method based on self-contradiction.
arXiv Detail & Related papers (2023-09-30T05:20:02Z) - A Linguistic Investigation of Machine Learning based Contradiction
Detection Models: An Empirical Analysis and Future Perspectives [0.34998703934432673]
We analyze two Natural Language Inference data sets with respect to their linguistic features.
The goal is to identify those syntactic and semantic properties that are particularly hard to comprehend for a machine learning model.
arXiv Detail & Related papers (2022-10-19T10:06:03Z) - MGD-GAN: Text-to-Pedestrian generation through Multi-Grained
Discrimination [96.91091607251526]
We propose the Multi-Grained Discrimination enhanced Generative Adversarial Network, that capitalizes a human-part-based Discriminator and a self-cross-attended Discriminator.
A fine-grained word-level attention mechanism is employed in the HPD module to enforce diversified appearance and vivid details.
The substantial improvement over the various metrics demonstrates the efficacy of MGD-GAN on the text-to-pedestrian synthesis scenario.
arXiv Detail & Related papers (2020-10-02T12:24:48Z) - Mechanisms for Handling Nested Dependencies in Neural-Network Language
Models and Humans [75.15855405318855]
We studied whether a modern artificial neural network trained with "deep learning" methods mimics a central aspect of human sentence processing.
Although the network was solely trained to predict the next word in a large corpus, analysis showed the emergence of specialized units that successfully handled local and long-distance syntactic agreement.
We tested the model's predictions in a behavioral experiment where humans detected violations in number agreement in sentences with systematic variations in the singular/plural status of multiple nouns.
arXiv Detail & Related papers (2020-06-19T12:00:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.