Seeing Through the Fog: A Cost-Effectiveness Analysis of Hallucination Detection Systems
- URL: http://arxiv.org/abs/2411.05270v1
- Date: Fri, 08 Nov 2024 02:06:41 GMT
- Title: Seeing Through the Fog: A Cost-Effectiveness Analysis of Hallucination Detection Systems
- Authors: Alexander Thomas, Seth Rosen, Vishnu Vettrivel,
- Abstract summary: We evaluate hallucination detection systems using the diagnostic odds ratio (DOR) and cost-effectiveness metrics.
Our results indicate that although advanced models can perform better they come at a much higher cost.
- Score: 45.3392300968787
- License:
- Abstract: This paper presents a comparative analysis of hallucination detection systems for AI, focusing on automatic summarization and question answering tasks for Large Language Models (LLMs). We evaluate different hallucination detection systems using the diagnostic odds ratio (DOR) and cost-effectiveness metrics. Our results indicate that although advanced models can perform better they come at a much higher cost. We also demonstrate how an ideal hallucination detection system needs to maintain performance across different model sizes. Our findings highlight the importance of choosing a detection system aligned with specific application needs and resource constraints. Future research will explore hybrid systems and automated identification of underperforming components to enhance AI reliability and efficiency in detecting and mitigating hallucinations.
Related papers
- Generating on Generated: An Approach Towards Self-Evolving Diffusion Models [58.05857658085845]
Recursive Self-Improvement (RSI) enables intelligence systems to autonomously refine their capabilities.
This paper explores the application of RSI in text-to-image diffusion models, addressing the challenge of training collapse caused by synthetic data.
arXiv Detail & Related papers (2025-02-14T07:41:47Z) - Active inference and deep generative modeling for cognitive ultrasound [20.383444113659476]
We show that US imaging systems can be recast as information-seeking agents that engage in reciprocal interactions with their anatomical environment.
Such agents autonomously adapt their transmit-receive sequences to fully personalize imaging and actively maximize information gain in-situ.
We then equip systems with a mechanism to actively reduce uncertainty and maximize diagnostic value across a sequence of experiments.
arXiv Detail & Related papers (2024-10-17T08:09:14Z) - Detecting and Mitigating Hallucination in Large Vision Language Models via Fine-Grained AI Feedback [40.930238150365795]
We propose detecting and mitigating hallucinations in Large Vision Language Models (LVLMs) via fine-grained AI feedback.
We generate a small-size hallucination annotation dataset by proprietary models.
Then, we propose a detect-then-rewrite pipeline to automatically construct preference dataset for training hallucination mitigating model.
arXiv Detail & Related papers (2024-04-22T14:46:10Z) - Towards Mitigating Hallucination in Large Language Models via
Self-Reflection [63.2543947174318]
Large language models (LLMs) have shown promise for generative and knowledge-intensive tasks including question-answering (QA) tasks.
This paper analyses the phenomenon of hallucination in medical generative QA systems using widely adopted LLMs and datasets.
arXiv Detail & Related papers (2023-10-10T03:05:44Z) - AutoHall: Automated Hallucination Dataset Generation for Large Language Models [56.92068213969036]
This paper introduces a method for automatically constructing model-specific hallucination datasets based on existing fact-checking datasets called AutoHall.
We also propose a zero-resource and black-box hallucination detection method based on self-contradiction.
arXiv Detail & Related papers (2023-09-30T05:20:02Z) - Scalable Online Disease Diagnosis via Multi-Model-Fused Actor-Critic
Reinforcement Learning [9.274138493400436]
For those seeking healthcare advice online, AI based dialogue agents capable of interacting with patients to perform automatic disease diagnosis are a viable option.
This can be formulated as a problem of sequential feature (symptom) selection and classification for which reinforcement learning (RL) approaches have been proposed as a natural solution.
We propose a Multi-Model-Fused Actor-Critic (MMF-AC) RL framework that consists of a generative actor network and a diagnostic critic network.
arXiv Detail & Related papers (2022-06-08T03:06:16Z) - Low to High Dimensional Modality Hallucination using Aggregated Fields
of View [48.32515709424962]
We argue modality hallucination as one effective way to ensure consistent modality availability.
We present a novel hallucination architecture that aggregates information from multiple fields of view of the local neighborhood.
We also conduct extensive classification and segmentation experiments on UWRGBD and NYUD datasets and demonstrate that hallucination allays the negative effects of the modality loss.
arXiv Detail & Related papers (2020-07-13T03:13:48Z) - SUOD: Accelerating Large-Scale Unsupervised Heterogeneous Outlier
Detection [63.253850875265115]
Outlier detection (OD) is a key machine learning (ML) task for identifying abnormal objects from general samples.
We propose a modular acceleration system, called SUOD, to address it.
arXiv Detail & Related papers (2020-03-11T00:22:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.