Enhancing Factual Accuracy and Citation Generation in LLMs via Multi-Stage Self-Verification
- URL: http://arxiv.org/abs/2509.05741v1
- Date: Sat, 06 Sep 2025 15:07:59 GMT
- Title: Enhancing Factual Accuracy and Citation Generation in LLMs via Multi-Stage Self-Verification
- Authors: Fernando Gabriela GarcĂa, Qiyang Shi, Zilin Feng,
- Abstract summary: This research introduces VeriFact-CoT, a novel method designed to address the pervasive issues of hallucination and the absence of credible citation sources in Large Language Models (LLMs)<n>By incorporating a multi-stage mechanism of 'fact verification-reflection-citation integration,' VeriFact-CoT empowers LLMs to critically self-examine and revise their intermediate reasoning steps and final answers.
- Score: 41.99844472131922
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This research introduces VeriFact-CoT (Verified Factual Chain-of-Thought), a novel method designed to address the pervasive issues of hallucination and the absence of credible citation sources in Large Language Models (LLMs) when generating complex, fact-sensitive content. By incorporating a multi-stage mechanism of 'fact verification-reflection-citation integration,' VeriFact-CoT empowers LLMs to critically self-examine and revise their intermediate reasoning steps and final answers. This process significantly enhances the objective accuracy, trustworthiness, and traceability of the generated outputs, making LLMs more reliable for applications demanding high fidelity such as scientific research, news reporting, and legal consultation.
Related papers
- Towards Comprehensive Stage-wise Benchmarking of Large Language Models in Fact-Checking [64.97768177044355]
Large Language Models (LLMs) are increasingly deployed in real-world fact-checking systems.<n>We present FactArena, a fully automated arena-style evaluation framework.<n>Our analyses reveal significant discrepancies between static claim-verification accuracy and end-to-end fact-checking competence.
arXiv Detail & Related papers (2026-01-06T02:51:56Z) - Agentic Multi-Persona Framework for Evidence-Aware Fake News Detection [0.7534418099163723]
AMPEND-LS is an agentic multi-persona evidence-grounded framework for multimodal fake news detection.<n>It integrates textual, visual, and contextual signals through a structured reasoning pipeline powered by LLMs.<n>Experiments show that AMPEND-LS consistently outperformed state-of-the-art baselines in accuracy, F1 score, and robustness.<n>This work advances the development of adaptive, explainable, and evidence-aware systems for safeguarding online information integrity.
arXiv Detail & Related papers (2025-12-24T08:06:52Z) - Veri-R1: Toward Precise and Faithful Claim Verification via Online Reinforcement Learning [53.05161493434908]
Claim verification with large language models (LLMs) has recently attracted growing attention, due to their strong reasoning capabilities and transparent verification processes.<n>We introduce Veri-R1, an online reinforcement learning framework that enables an LLM to interact with a search engine and to receive reward signals that explicitly shape its planning, retrieval, and reasoning behaviors.<n> Empirical results show that Veri-R1 improves joint accuracy by up to 30% and doubles the evidence score, often surpassing its larger-scale model counterparts.
arXiv Detail & Related papers (2025-10-02T11:49:48Z) - Improving the Reliability of LLMs: Combining CoT, RAG, Self-Consistency, and Self-Verification [1.5095869543963976]
Large language models (LLMs) generate confident but incorrect or irrelevant information.<n>Hallucination is a key limitation in their application to complex, open-ended tasks.<n>We investigate how combining Chain-of-thought (CoT) with retrieval-augmented generation (RAG) can reduce hallucinations.
arXiv Detail & Related papers (2025-05-13T23:57:02Z) - FACT-AUDIT: An Adaptive Multi-Agent Framework for Dynamic Fact-Checking Evaluation of Large Language Models [79.41859481668618]
Large Language Models (LLMs) have significantly advanced the fact-checking studies.<n>Existing automated fact-checking evaluation methods rely on static datasets and classification metrics.<n>We introduce FACT-AUDIT, an agent-driven framework that adaptively and dynamically assesses LLMs' fact-checking capabilities.
arXiv Detail & Related papers (2025-02-25T07:44:22Z) - A Survey on Uncertainty Quantification of Large Language Models: Taxonomy, Open Research Challenges, and Future Directions [9.045698110081686]
Large language models (LLMs) generate plausible, factually-incorrect responses, which are expressed with striking confidence.<n>Previous work has shown that hallucinations and other non-factual responses generated by LLMs can be detected by examining the uncertainty of the LLM in its response to the pertinent prompt.<n>This survey seeks to provide an extensive review of existing uncertainty quantification methods for LLMs, identifying their salient features, along with their strengths and weaknesses.
arXiv Detail & Related papers (2024-12-07T06:56:01Z) - TRACE: TRansformer-based Attribution using Contrastive Embeddings in LLMs [50.259001311894295]
We propose a novel TRansformer-based Attribution framework using Contrastive Embeddings called TRACE.
We show that TRACE significantly improves the ability to attribute sources accurately, making it a valuable tool for enhancing the reliability and trustworthiness of large language models.
arXiv Detail & Related papers (2024-07-06T07:19:30Z) - Ground Every Sentence: Improving Retrieval-Augmented LLMs with Interleaved Reference-Claim Generation [51.8188846284153]
Attributed Text Generation (ATG) is proposed to enhance credibility and verifiability in RAG systems.<n>This paper proposes ReClaim, a fine-grained ATG method that alternates the generation of references and answers step by step.<n>With extensive experiments, we verify the effectiveness of ReClaim in extensive settings, achieving a citation accuracy rate of 90%.
arXiv Detail & Related papers (2024-07-01T20:47:47Z) - Evidence-Focused Fact Summarization for Knowledge-Augmented Zero-Shot Question Answering [14.389264346634507]
We propose EFSum, an Evidence-focused Fact Summarization framework for enhanced Quesetion Answering (QA) performance.
Our experiments show that EFSum improves LLM's zero-shot QA performance.
arXiv Detail & Related papers (2024-03-05T13:43:58Z) - Can LLMs Produce Faithful Explanations For Fact-checking? Towards
Faithful Explainable Fact-Checking via Multi-Agent Debate [75.10515686215177]
Large Language Models (LLMs) excel in text generation, but their capability for producing faithful explanations in fact-checking remains underexamined.
We propose the Multi-Agent Debate Refinement (MADR) framework, leveraging multiple LLMs as agents with diverse roles.
MADR ensures that the final explanation undergoes rigorous validation, significantly reducing the likelihood of unfaithful elements and aligning closely with the provided evidence.
arXiv Detail & Related papers (2024-02-12T04:32:33Z) - Know Where to Go: Make LLM a Relevant, Responsible, and Trustworthy
Searcher [10.053004550486214]
Large Language Models (LLMs) have shown the potential to improve relevance and provide direct answers in web searches.
challenges arise in the reliability of generated results and the credibility of contributing sources.
We propose a novel generative retrieval framework leveraging the knowledge of LLMs to foster a direct link between queries and online sources.
arXiv Detail & Related papers (2023-10-19T03:49:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.