Fine-Refine: Iterative Fine-grained Refinement for Mitigating Dialogue Hallucination
- URL: http://arxiv.org/abs/2602.15509v1
- Date: Tue, 17 Feb 2026 11:33:23 GMT
- Title: Fine-Refine: Iterative Fine-grained Refinement for Mitigating Dialogue Hallucination
- Authors: Xiangyan Chen, Yujian Gan, Matthew Purver,
- Abstract summary: hallucinations produce factually incorrect responses that may mislead users and undermine system trust.<n>Existing refinement methods for dialogue systems typically operate at the response level, overlooking the fact that a single response may contain multiple verifiable or unverifiable facts.<n>We propose Fine-Refine, a fine-grained refinement framework that decomposes responses into atomic units, verifies each unit using external knowledge, assesses fluency via perplexity, and iteratively corrects granular errors.
- Score: 6.907950142408847
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The tendency for hallucination in current large language models (LLMs) negatively impacts dialogue systems. Such hallucinations produce factually incorrect responses that may mislead users and undermine system trust. Existing refinement methods for dialogue systems typically operate at the response level, overlooking the fact that a single response may contain multiple verifiable or unverifiable facts. To address this gap, we propose Fine-Refine, a fine-grained refinement framework that decomposes responses into atomic units, verifies each unit using external knowledge, assesses fluency via perplexity, and iteratively corrects granular errors. We evaluate factuality across the HybriDialogue and OpendialKG datasets in terms of factual accuracy (fact score) and coverage (Not Enough Information Proportion), and experiments show that Fine-Refine substantially improves factuality, achieving up to a 7.63-point gain in dialogue fact score, with a small trade-off in dialogue quality.
Related papers
- VISTA Score: Verification In Sequential Turn-based Assessment [18.318681275086902]
We introduce VISTA, a framework for evaluating conversational factuality through claim-level verification and sequential consistency tracking.<n> VISTA decomposes each assistant turn into atomic factual claims, verifies them against trusted sources and dialogue history, and categorizes unverifiable statements.<n>Human evaluation confirms that VISTA's decomposition improves annotator agreement and reveals inconsistencies in existing benchmarks.
arXiv Detail & Related papers (2025-10-30T23:45:13Z) - FineDialFact: A benchmark for Fine-grained Dialogue Fact Verification [45.2458418225596]
Large Language Models (LLMs) are known to produce hallucinations - factually incorrect or fabricated information.<n>Current approaches to hallucination detection in dialogue systems primarily focus on verifying the factual consistency of generated responses.<n>We introduce a benchmark, FineDialFact, for fine-grained dialogue fact verification.
arXiv Detail & Related papers (2025-08-07T18:51:03Z) - Improving Factuality for Dialogue Response Generation via Graph-Based Knowledge Augmentation [8.423723358002539]
Large Language Models (LLMs) generate plausible but inconsistent or factually incorrect text.<n>We propose two novel graph knowledge-augmented frameworks, Dialogue Response Generation via Textualised Graphs (TG-DRG) and Graph-Aware Dialogue Response Generation (GA-DRG)<n>TG-DRG combines reasoning-guided dialogue reformulation, dialogue sense knowledge selection, and graph-enhanced response generation to improve the factuality of dialogue responses.
arXiv Detail & Related papers (2025-06-14T13:17:27Z) - Hallucination Detection: Robustly Discerning Reliable Answers in Large Language Models [70.19081534515371]
Large Language Models (LLMs) have gained widespread adoption in various natural language processing tasks.
They generate unfaithful or inconsistent content that deviates from the input source, leading to severe consequences.
We propose a robust discriminator named RelD to effectively detect hallucination in LLMs' generated answers.
arXiv Detail & Related papers (2024-07-04T18:47:42Z) - A Cause-Effect Look at Alleviating Hallucination of Knowledge-grounded Dialogue Generation [51.53917938874146]
We propose a possible solution for alleviating the hallucination in KGD by exploiting the dialogue-knowledge interaction.
Experimental results of our example implementation show that this method can reduce hallucination without disrupting other dialogue performance.
arXiv Detail & Related papers (2024-04-04T14:45:26Z) - Improving the Robustness of Knowledge-Grounded Dialogue via Contrastive
Learning [71.8876256714229]
We propose an entity-based contrastive learning framework for improving the robustness of knowledge-grounded dialogue systems.
Our method achieves new state-of-the-art performance in terms of automatic evaluation scores.
arXiv Detail & Related papers (2024-01-09T05:16:52Z) - FactCHD: Benchmarking Fact-Conflicting Hallucination Detection [64.4610684475899]
FactCHD is a benchmark designed for the detection of fact-conflicting hallucinations from LLMs.
FactCHD features a diverse dataset that spans various factuality patterns, including vanilla, multi-hop, comparison, and set operation.
We introduce Truth-Triangulator that synthesizes reflective considerations by tool-enhanced ChatGPT and LoRA-tuning based on Llama2.
arXiv Detail & Related papers (2023-10-18T16:27:49Z) - PICK: Polished & Informed Candidate Scoring for Knowledge-Grounded
Dialogue Systems [59.1250765143521]
Current knowledge-grounded dialogue systems often fail to align the generated responses with human-preferred qualities.
We propose Polished & Informed Candidate Scoring (PICK), a generation re-scoring framework.
We demonstrate the effectiveness of PICK in generating responses that are more faithful while keeping them relevant to the dialogue history.
arXiv Detail & Related papers (2023-09-19T08:27:09Z) - Unleashing Potential of Evidence in Knowledge-Intensive Dialogue
Generation [37.29386687125705]
We propose a framework to effectively incorporate Evidence in knowledge-Intensive Dialogue Generation (u-EIDG)
Specifically, we introduce an automatic evidence generation framework that harnesses the power of Large Language Models (LLMs) to mine reliable evidence labels from unlabeled data.
By utilizing these evidence labels, we train a reliable evidence indicator to effectively identify relevant evidence from retrieved passages.
arXiv Detail & Related papers (2023-09-15T13:13:30Z) - Elastic Weight Removal for Faithful and Abstractive Dialogue Generation [61.40951756070646]
A dialogue system should generate responses that are faithful to the knowledge contained in relevant documents.
Many models generate hallucinated responses instead that contradict it or contain unverifiable information.
We show that our method can be extended to simultaneously discourage hallucinations and extractive responses.
arXiv Detail & Related papers (2023-03-30T17:40:30Z) - Neural Path Hunter: Reducing Hallucination in Dialogue Systems via Path
Grounding [15.62141731259161]
We focus on the task of improving the faithfulness of Neural Dialogue Systems to known facts supplied by a Knowledge Graph (KG)
We propose Neural Path Hunter which follows a generate-then-refine strategy whereby a generated response is amended using the k-hop subgraph of a KG.
Our proposed model can easily be applied to any dialogue generated responses without retraining the model.
arXiv Detail & Related papers (2021-04-17T05:23:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.