Monitoring Decoding: Mitigating Hallucination via Evaluating the Factuality of Partial Response during Generation
- URL: http://arxiv.org/abs/2503.03106v1
- Date: Wed, 05 Mar 2025 01:51:03 GMT
- Title: Monitoring Decoding: Mitigating Hallucination via Evaluating the Factuality of Partial Response during Generation
- Authors: Yurui Chang, Bochuan Cao, Lu Lin,
- Abstract summary: Large language models are susceptible to hallucinations, generating plausible yet factually incorrect contents.<n>Existing methods to mitigate such risk often rely on sampling multiple full-length generations.<n>We introduce Monitoring Decoding, a novel framework that dynamically monitors the generation process.
- Score: 9.137042895376343
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While large language models have demonstrated exceptional performance across a wide range of tasks, they remain susceptible to hallucinations -- generating plausible yet factually incorrect contents. Existing methods to mitigating such risk often rely on sampling multiple full-length generations, which introduces significant response latency and becomes ineffective when the model consistently produces hallucinated outputs with high confidence. To address these limitations, we introduce Monitoring Decoding (MD), a novel framework that dynamically monitors the generation process and selectively applies in-process interventions, focusing on revising crucial tokens responsible for hallucinations. Instead of waiting until completion of multiple full-length generations, we identify hallucination-prone tokens during generation using a monitor function, and further refine these tokens through a tree-based decoding strategy. This approach ensures an enhanced factual accuracy and coherence in the generated output while maintaining efficiency. Experimental results demonstrate that MD consistently outperforms self-consistency-based approaches in both effectiveness and efficiency, achieving higher factual accuracy while significantly reducing computational overhead.
Related papers
- Robust Hallucination Detection in LLMs via Adaptive Token Selection [25.21763722332831]
Hallucinations in large language models (LLMs) pose significant safety concerns that impede their broader deployment.
We propose HaMI, a novel approach that enables robust detection of hallucinations through adaptive selection and learning of critical tokens.
We achieve this robustness by an innovative formulation of the Hallucination detection task as Multiple Instance (HaMI) learning over token-level representations within a sequence.
arXiv Detail & Related papers (2025-04-10T15:39:10Z) - TPC: Cross-Temporal Prediction Connection for Vision-Language Model Hallucination Reduction [5.925383490825323]
Vision-language models (VLMs) have achieved remarkable advancements, capitalizing on the impressive capabilities of large language models (LLMs)
Despite this, a critical challenge known as hallucination occurs when models overconfidently describe objects or attributes absent from the image.
This limitation reduces model reliability in high-stakes applications.
arXiv Detail & Related papers (2025-03-06T14:11:00Z) - Self-Correcting Decoding with Generative Feedback for Mitigating Hallucinations in Large Vision-Language Models [66.71616369573715]
Large Vision-Language Models (LVLMs) are prone to generating hallucinatory text responses that do not align with the given visual input.
We introduce self-correcting Decoding with Generative Feedback (DeGF), a novel training-free algorithm that incorporates feedback from text-to-image generative models into the decoding process.
arXiv Detail & Related papers (2025-02-10T03:43:55Z) - Optimizing Resource Consumption in Diffusion Models through Hallucination Early Detection [87.22082662250999]
We introduce HEaD (Hallucination Early Detection), a new paradigm designed to swiftly detect incorrect generations at the beginning of the diffusion process.
We demonstrate that using HEaD saves computational resources and accelerates the generation process to get a complete image.
Our findings reveal that HEaD can save up to 12% of the generation time on a two objects scenario.
arXiv Detail & Related papers (2024-09-16T18:00:00Z) - Enhancing Hallucination Detection through Perturbation-Based Synthetic Data Generation in System Responses [25.692970375644435]
We introduce an approach that automatically generates both faithful and hallucinated outputs by rewriting system responses.
Experimental findings demonstrate that a T5-base model, fine-tuned on our generated dataset, surpasses state-of-the-art zero-shot detectors.
arXiv Detail & Related papers (2024-07-07T19:19:32Z) - Synchronous Faithfulness Monitoring for Trustworthy Retrieval-Augmented Generation [96.78845113346809]
Retrieval-augmented language models (RALMs) have shown strong performance and wide applicability in knowledge-intensive tasks.
This paper proposes SynCheck, a lightweight monitor that leverages fine-grained decoding dynamics to detect unfaithful sentences.
We also introduce FOD, a faithfulness-oriented decoding algorithm guided by beam search for long-form retrieval-augmented generation.
arXiv Detail & Related papers (2024-06-19T16:42:57Z) - Accelerating LLaMA Inference by Enabling Intermediate Layer Decoding via
Instruction Tuning with LITE [62.13435256279566]
Large Language Models (LLMs) have achieved remarkable performance across a wide variety of natural language tasks.
However, their large size makes their inference slow and computationally expensive.
We show that it enables these layers to acquire 'good' generation ability without affecting the generation ability of the final layer.
arXiv Detail & Related papers (2023-10-28T04:07:58Z) - RARE: Robust Masked Graph Autoencoder [45.485891794905946]
Masked graph autoencoder (MGAE) has emerged as a promising self-supervised graph pre-training (SGP) paradigm.
We propose a novel SGP method termed Robust mAsked gRaph autoEncoder (RARE) to improve the certainty in inferring masked data.
arXiv Detail & Related papers (2023-04-04T03:35:29Z) - Beyond the Spectrum: Detecting Deepfakes via Re-Synthesis [69.09526348527203]
Deep generative models have led to highly realistic media, known as deepfakes, that are commonly indistinguishable from real to human eyes.
We propose a novel fake detection that is designed to re-synthesize testing images and extract visual cues for detection.
We demonstrate the improved effectiveness, cross-GAN generalization, and robustness against perturbations of our approach in a variety of detection scenarios.
arXiv Detail & Related papers (2021-05-29T21:22:24Z) - Detecting Hallucinated Content in Conditional Neural Sequence Generation [165.68948078624499]
We propose a task to predict whether each token in the output sequence is hallucinated (not contained in the input)
We also introduce a method for learning to detect hallucinations using pretrained language models fine tuned on synthetic data.
arXiv Detail & Related papers (2020-11-05T00:18:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.