Grounding or Guessing? Visual Signals for Detecting Hallucinations in Sign Language Translation
- URL: http://arxiv.org/abs/2510.18439v1
- Date: Tue, 21 Oct 2025 09:13:46 GMT
- Title: Grounding or Guessing? Visual Signals for Detecting Hallucinations in Sign Language Translation
- Authors: Yasser Hamidullah, Koel Dutta Chowdury, Yusser Al-Ghussin, Shakib Yazdani, Cennet Oguz, Josef van Genabith, Cristina EspaƱa-Bonet,
- Abstract summary: Hallucination is a major flaw in vision-language models and is particularly critical in sign language translation.<n>We propose a token-level reliability measure that quantifies how much the decoder uses visual information.<n>Our results show that reliability predicts hallucination rates, generalizes across datasets and architectures, and decreases under visual degradations.
- Score: 13.03365340564181
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Hallucination, where models generate fluent text unsupported by visual evidence, remains a major flaw in vision-language models and is particularly critical in sign language translation (SLT). In SLT, meaning depends on precise grounding in video, and gloss-free models are especially vulnerable because they map continuous signer movements directly into natural language without intermediate gloss supervision that serves as alignment. We argue that hallucinations arise when models rely on language priors rather than visual input. To capture this, we propose a token-level reliability measure that quantifies how much the decoder uses visual information. Our method combines feature-based sensitivity, which measures internal changes when video is masked, with counterfactual signals, which capture probability differences between clean and altered video inputs. These signals are aggregated into a sentence-level reliability score, providing a compact and interpretable measure of visual grounding. We evaluate the proposed measure on two SLT benchmarks (PHOENIX-2014T and CSL-Daily) with both gloss-based and gloss-free models. Our results show that reliability predicts hallucination rates, generalizes across datasets and architectures, and decreases under visual degradations. Beyond these quantitative trends, we also find that reliability distinguishes grounded tokens from guessed ones, allowing risk estimation without references; when combined with text-based signals (confidence, perplexity, or entropy), it further improves hallucination risk estimation. Qualitative analysis highlights why gloss-free models are more susceptible to hallucinations. Taken together, our findings establish reliability as a practical and reusable tool for diagnosing hallucinations in SLT, and lay the groundwork for more robust hallucination detection in multimodal generation.
Related papers
- SAVE: Sparse Autoencoder-Driven Visual Information Enhancement for Mitigating Object Hallucination [48.601385640941935]
We propose SAVE, a framework that mitigates hallucination by steering the model along Sparse Autoencoder latent features.<n>A binary object-presence question-answering probe identifies the SAE features most indicative of the model's visual information processing.<n>With its simple design, SAVE outperforms state-of-the-art training-free methods on standard benchmarks.
arXiv Detail & Related papers (2025-12-08T17:20:07Z) - Measuring the Impact of Lexical Training Data Coverage on Hallucination Detection in Large Language Models [26.89705770151822]
Hallucination in large language models (LLMs) is a fundamental challenge, particularly in open-domain question answering.<n>Prior work attempts to detect hallucination with model-internal signals such as token-level entropy or generation consistency.<n>We investigate whether data coverage itself can serve as a detection signal.
arXiv Detail & Related papers (2025-11-22T06:59:55Z) - SHALE: A Scalable Benchmark for Fine-grained Hallucination Evaluation in LVLMs [52.03164192840023]
Large Vision-Language Models (LVLMs) still suffer from hallucinations, i.e., generating content inconsistent with input or established world knowledge.<n>We propose an automated data construction pipeline that produces scalable, controllable, and diverse evaluation data.<n>We construct SHALE, a benchmark designed to assess both faithfulness and factuality hallucinations.
arXiv Detail & Related papers (2025-08-13T07:58:01Z) - Mitigating Object Hallucinations via Sentence-Level Early Intervention [10.642552315531404]
Multimodal large language models (MLLMs) have revolutionized cross-modal understanding but continue to struggle with hallucinations.<n>We propose SENTINEL, a framework that eliminates dependency on human annotations.<n>Sentence-level Early iNtervention Through IN-domain prEference Learning can reduce hallucinations by over 90% compared to the original model.
arXiv Detail & Related papers (2025-07-16T17:55:43Z) - HalLoc: Token-level Localization of Hallucinations for Vision Language Models [36.12465376767014]
Hallucinations pose a significant challenge to the reliability of large vision-language models.<n>HalLoc is a dataset designed for efficient, probabilistic hallucination detection.
arXiv Detail & Related papers (2025-06-12T01:50:35Z) - OViP: Online Vision-Language Preference Learning for VLM Hallucination [44.14029765850719]
Large vision-language models (LVLMs) remain vulnerable to hallucination, often generating content misaligned with visual inputs.<n>We propose an Online Vision-language Preference Learning framework that dynamically constructs contrastive training data based on the model's own hallucinations.
arXiv Detail & Related papers (2025-05-21T19:26:09Z) - HalluLens: LLM Hallucination Benchmark [49.170128733508335]
Large language models (LLMs) often generate responses that deviate from user input or training data, a phenomenon known as "hallucination"<n>This paper introduces a comprehensive hallucination benchmark, incorporating both new extrinsic and existing intrinsic evaluation tasks.
arXiv Detail & Related papers (2025-04-24T13:40:27Z) - Generate, but Verify: Reducing Hallucination in Vision-Language Models with Retrospective Resampling [78.78822033285938]
Vision-Language Models (VLMs) excel at visual understanding but often suffer from visual hallucinations.<n>In this work, we introduce REVERSE, a unified framework that integrates hallucination-aware training with on-the-fly self-verification.
arXiv Detail & Related papers (2025-04-17T17:59:22Z) - Mitigating Hallucinations in Large Vision-Language Models via Summary-Guided Decoding [14.701135083174918]
Large Vision-Language Models (LVLMs) generate detailed and coherent responses from visual inputs.<n>They are prone to generate hallucinations due to an over-reliance on language priors.<n>We propose a novel method, Summary-Guided Decoding (SumGD)
arXiv Detail & Related papers (2024-10-17T08:24:27Z) - Detecting and Mitigating Hallucination in Large Vision Language Models via Fine-Grained AI Feedback [40.930238150365795]
We propose detecting and mitigating hallucinations in Large Vision Language Models (LVLMs) via fine-grained AI feedback.<n>We generate a small-size hallucination annotation dataset by proprietary models.<n>Then, we propose a detect-then-rewrite pipeline to automatically construct preference dataset for training hallucination mitigating model.
arXiv Detail & Related papers (2024-04-22T14:46:10Z) - Hallucination Augmented Contrastive Learning for Multimodal Large
Language Model [53.65682783591723]
Multi-modal large language models (MLLMs) have been shown to efficiently integrate natural language with visual information to handle multi-modal tasks.
However, MLLMs still face a fundamental limitation of hallucinations, where they tend to generate erroneous or fabricated information.
In this paper, we address hallucinations in MLLMs from a novel perspective of representation learning.
arXiv Detail & Related papers (2023-12-12T04:05:15Z) - Detecting Hallucinated Content in Conditional Neural Sequence Generation [165.68948078624499]
We propose a task to predict whether each token in the output sequence is hallucinated (not contained in the input)
We also introduce a method for learning to detect hallucinations using pretrained language models fine tuned on synthetic data.
arXiv Detail & Related papers (2020-11-05T00:18:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.