Related papers: Towards Detecting LLMs Hallucination via Markov Chain-based Multi-agent Debate Framework

Towards Detecting LLMs Hallucination via Markov Chain-based Multi-agent Debate Framework

URL: http://arxiv.org/abs/2406.03075v1
Date: Wed, 5 Jun 2024 08:59:45 GMT
Title: Towards Detecting LLMs Hallucination via Markov Chain-based Multi-agent Debate Framework
Authors: Xiaoxi Sun, Jinpeng Li, Yan Zhong, Dongyan Zhao, Rui Yan,
Abstract summary: We propose a Markov Chain-based multi-agent debate verification framework to enhance hallucination detection accuracy in concise claims. Our method integrates the fact-checking process, including claim detection, evidence retrieval, and multi-agent verification.
Score: 41.47029501736853
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The advent of large language models (LLMs) has facilitated the development of natural language text generation. It also poses unprecedented challenges, with content hallucination emerging as a significant concern. Existing solutions often involve expensive and complex interventions during the training process. Moreover, some approaches emphasize problem disassembly while neglecting the crucial validation process, leading to performance degradation or limited applications. To overcome these limitations, we propose a Markov Chain-based multi-agent debate verification framework to enhance hallucination detection accuracy in concise claims. Our method integrates the fact-checking process, including claim detection, evidence retrieval, and multi-agent verification. In the verification stage, we deploy multiple agents through flexible Markov Chain-based debates to validate individual claims, ensuring meticulous verification outcomes. Experimental results across three generative tasks demonstrate that our approach achieves significant improvements over baselines.

Related papers

MAS-ProVe: Understanding the Process Verification of Multi-Agent Systems [59.20800753428596]
We present MAS-ProVe, a systematic empirical study of process verification for multi-agent systems (MAS)<n>Our study spans three verification paradigms (LLM-as-a-Judge, reward models, and process reward models)<n>We find that process-level verification does not consistently improve performance and frequently exhibits high variance.
arXiv Detail & Related papers (2026-02-03T03:30:36Z)
Robust Uncertainty Quantification for Factual Generation of Large Language Models [22.060021788289202]
Large language model(LLM) technology has facilitated its integration into various domains of professional and daily life.<n>The persistent challenge of LLM hallucination has emerged as a critical limitation, significantly compromising the reliability and trustworthiness of AI-generated content.<n>This study proposes an uncertainty quantification scenario in the task of generating with multiple facts.
arXiv Detail & Related papers (2026-01-01T14:06:58Z)
Exploring Health Misinformation Detection with Multi-Agent Debate [0.11470070927586014]
We propose a two-stage framework for health misinformation detection.<n>In the first stage, we employ large language models (LLMs) to independently evaluate retrieved articles.<n>When this score indicates insufficient consensus-falling below a predefined threshold-the system proceeds to a second stage.<n>Multiple agents engage in structured debate to synthesize conflicting evidence and generate well-reasoned verdicts with explicit justifications.
arXiv Detail & Related papers (2025-11-29T12:39:30Z)
Multi-stage Prompt Refinement for Mitigating Hallucinations in Large Language Models [49.435669307386156]
Multi-stage Prompt Refinement (MPR) is a framework designed to systematically improve ill-formed prompts across multiple stages.<n>MPR iteratively enhances the clarity of prompts with additional context and employs a self-reflection mechanism with ranking to prioritize the most relevant input.<n>Results on hallucination benchmarks show that MPR achieve over an 85% win rate compared to their original forms.
arXiv Detail & Related papers (2025-10-14T00:31:36Z)
Preemptive Hallucination Reduction: An Input-Level Approach for Multimodal Language Model [1.124958340749622]
This study presents a novel ensemble-based preprocessing framework that adaptively selects the most appropriate filtering approach.<n>The method achieves a 44.3% reduction in hallucination rates, as measured by Natural Language Inference (NLI) scores.<n>The findings highlight the importance of adaptive preprocessing techniques in mitigating hallucinations, paving the way for more reliable multimodal systems.
arXiv Detail & Related papers (2025-05-29T21:09:34Z)
Towards General Visual-Linguistic Face Forgery Detection(V2) [90.6600794602029]
Face manipulation techniques have achieved significant advances, presenting serious challenges to security and social trust. Recent works demonstrate that leveraging multimodal models can enhance the generalization and interpretability of face forgery detection. We propose Face Forgery Text Generator (FFTG), a novel annotation pipeline that generates accurate text descriptions by leveraging forgery masks for initial region and type identification.
arXiv Detail & Related papers (2025-02-28T04:15:36Z)
Agentic Verification for Ambiguous Query Disambiguation [42.238086712267396]
We tackle the challenge of disambiguating queries in retrieval-augmented generation (RAG) to diverse yet answerable interpretations. We propose a joint approach to unify diversification with verification by incorporating feedback from retriever and generator early on. We validate the efficiency and effectiveness of our method on the widely adopted ASQA benchmark to achieve diverse yet verifiable interpretations.
arXiv Detail & Related papers (2025-02-14T18:31:39Z)
Step-by-Step Reasoning for Math Problems via Twisted Sequential Monte Carlo [55.452453947359736]
We introduce a novel verification method based on Twisted Sequential Monte Carlo (TSMC) We apply TSMC to Large Language Models by estimating the expected future rewards at partial solutions. This approach results in a more straightforward training target that eliminates the need for step-wise human annotations.
arXiv Detail & Related papers (2024-10-02T18:17:54Z)
Textualized Agent-Style Reasoning for Complex Tasks by Multiple Round LLM Generation [49.27250832754313]
We present AgentCOT, a llm-based autonomous agent framework. At each step, AgentCOT selects an action and executes it to yield an intermediate result with supporting evidence. We introduce two new strategies to enhance the performance of AgentCOT.
arXiv Detail & Related papers (2024-09-19T02:20:06Z)
CoT Rerailer: Enhancing the Reliability of Large Language Models in Complex Reasoning Tasks through Error Detection and Correction [9.44858963874474]
Chain-of-Thought (CoT) prompting enhances Large Language Models (LLMs) complex reasoning abilities. We propose the CoT Rerailer to address these challenges, employing self-consistency and multi-agent debate systems. We demonstrate the effectiveness of our approach across diverse question-answering datasets in various knowledge domains.
arXiv Detail & Related papers (2024-08-25T21:20:17Z)
Drowzee: Metamorphic Testing for Fact-Conflicting Hallucination Detection in Large Language Models [11.138489774712163]
We propose an innovative approach leveraging logic programming to enhance metamorphic testing for detecting Fact-Conflicting Hallucinations (FCH) Our method generates test cases and detects hallucinations across six different large language models spanning nine domains, revealing rates ranging from 24.7% to 59.8%.
arXiv Detail & Related papers (2024-05-01T17:24:42Z)
KnowHalu: Hallucination Detection via Multi-Form Knowledge Based Factual Checking [55.2155025063668]
KnowHalu is a novel approach for detecting hallucinations in text generated by large language models (LLMs) It uses step-wise reasoning, multi-formulation query, multi-form knowledge for factual checking, and fusion-based detection mechanism. Our evaluations demonstrate that KnowHalu significantly outperforms SOTA baselines in detecting hallucinations across diverse tasks.
arXiv Detail & Related papers (2024-04-03T02:52:07Z)
A New Benchmark and Reverse Validation Method for Passage-level Hallucination Detection [63.56136319976554]
Large Language Models (LLMs) generate hallucinations, which can cause significant damage when deployed for mission-critical tasks. We propose a self-check approach based on reverse validation to detect factual errors automatically in a zero-resource fashion. We empirically evaluate our method and existing zero-resource detection methods on two datasets.
arXiv Detail & Related papers (2023-10-10T10:14:59Z)
MGTBench: Benchmarking Machine-Generated Text Detection [54.81446366272403]
This paper proposes the first benchmark framework for MGT detection against powerful large language models (LLMs) We show that a larger number of words in general leads to better performance and most detection methods can achieve similar performance with much fewer training samples. Our findings indicate that the model-based detection methods still perform well in the text attribution task.
arXiv Detail & Related papers (2023-03-26T21:12:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.