MERMAID: Memory-Enhanced Retrieval and Reasoning with Multi-Agent Iterative Knowledge Grounding for Veracity Assessment
- URL: http://arxiv.org/abs/2601.22361v1
- Date: Thu, 29 Jan 2026 22:12:33 GMT
- Title: MERMAID: Memory-Enhanced Retrieval and Reasoning with Multi-Agent Iterative Knowledge Grounding for Veracity Assessment
- Authors: Yupeng Cao, Chengyang He, Yangyang Yu, Ping Wang, K. P. Subbalakshmi,
- Abstract summary: We propose a memory-enhanced veracity assessment framework that tightly couples the retrieval and reasoning processes.<n> MERMAID integrates agent-driven search, structured knowledge representations, and a persistent memory module within a Reason-Action style iterative process.<n>We evaluate MERMAID on three fact-checking benchmarks and two claim-verification datasets using multiple LLMs.
- Score: 8.649665560258702
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Assessing the veracity of online content has become increasingly critical. Large language models (LLMs) have recently enabled substantial progress in automated veracity assessment, including automated fact-checking and claim verification systems. Typical veracity assessment pipelines break down complex claims into sub-claims, retrieve external evidence, and then apply LLM reasoning to assess veracity. However, existing methods often treat evidence retrieval as a static, isolated step and do not effectively manage or reuse retrieved evidence across claims. In this work, we propose MERMAID, a memory-enhanced multi-agent veracity assessment framework that tightly couples the retrieval and reasoning processes. MERMAID integrates agent-driven search, structured knowledge representations, and a persistent memory module within a Reason-Action style iterative process, enabling dynamic evidence acquisition and cross-claim evidence reuse. By retaining retrieved evidence in an evidence memory, the framework reduces redundant searches and improves verification efficiency and consistency. We evaluate MERMAID on three fact-checking benchmarks and two claim-verification datasets using multiple LLMs, including GPT, LLaMA, and Qwen families. Experimental results show that MERMAID achieves state-of-the-art performance while improving the search efficiency, demonstrating the effectiveness of synergizing retrieval, reasoning, and memory for reliable veracity assessment.
Related papers
- Leveraging LLM Parametric Knowledge for Fact Checking without Retrieval [60.25608870901428]
Trustworthiness is a core research challenge for agentic AI systems built on Large Language Models (LLMs)<n>We propose the task of fact-checking without retrieval, focusing on the verification of arbitrary natural language claims, independent of their source robustness.
arXiv Detail & Related papers (2026-03-05T18:42:51Z) - Multi-Sourced, Multi-Agent Evidence Retrieval for Fact-Checking [47.47518672198846]
Misinformation spreading over the Internet poses a significant threat to both societies and individuals.<n>Previous methods rely on semantic and social-contextual patterns learned from training data.<n>We propose WKGFC, which exploits authorized open knowledge graph as a core resource of evidence.
arXiv Detail & Related papers (2026-02-27T19:29:01Z) - ExDR: Explanation-driven Dynamic Retrieval Enhancement for Multimodal Fake News Detection [23.87220484843729]
multimodal fake news poses a serious societal threat.<n> Dynamic Retrieval-Augmented Generation provides a promising solution by triggering keyword-based retrieval.<n>We propose ExDR, an Explanation-driven Dynamic Retrieval-Augmented Generation framework for Multimodal Fake News Detection.
arXiv Detail & Related papers (2026-01-22T10:10:06Z) - FaStfact: Faster, Stronger Long-Form Factuality Evaluations in LLMs [34.87719459551127]
textbfFaStfact is an evaluation framework that achieves the highest alignment with human evaluation and time/token efficiency.<n>FaStfact first employs chunk-level claim extraction integrated with confidence-based pre-verification.<n>For searching and verification, it collects document-level evidence from crawled web-pages and selectively retrieves it during verification.
arXiv Detail & Related papers (2025-10-13T19:00:15Z) - Veri-R1: Toward Precise and Faithful Claim Verification via Online Reinforcement Learning [53.05161493434908]
Claim verification with large language models (LLMs) has recently attracted growing attention, due to their strong reasoning capabilities and transparent verification processes.<n>We introduce Veri-R1, an online reinforcement learning framework that enables an LLM to interact with a search engine and to receive reward signals that explicitly shape its planning, retrieval, and reasoning behaviors.<n> Empirical results show that Veri-R1 improves joint accuracy by up to 30% and doubles the evidence score, often surpassing its larger-scale model counterparts.
arXiv Detail & Related papers (2025-10-02T11:49:48Z) - Demystifying deep search: a holistic evaluation with hint-free multi-hop questions and factorised metrics [89.1999907891494]
We present WebDetective, a benchmark of hint-free multi-hop questions paired with a controlled Wikipedia sandbox.<n>Our evaluation of 25 state-of-the-art models reveals systematic weaknesses across all architectures.<n>We develop an agentic workflow, EvidenceLoop, that explicitly targets the challenges our benchmark identifies.
arXiv Detail & Related papers (2025-10-01T07:59:03Z) - SePer: Measure Retrieval Utility Through The Lens Of Semantic Perplexity Reduction [20.6787276745193]
We introduce an automatic evaluation method that measures retrieval quality through the lens of information gain within the RAG framework.<n>We quantify the utility of retrieval by the extent to which it reduces semantic perplexity post-retrieval.
arXiv Detail & Related papers (2025-03-03T12:37:34Z) - Retrieval-Augmented Generation by Evidence Retroactivity in LLMs [19.122314663040726]
Retroactive Retrieval-Augmented Generation (RetroRAG) is a novel framework to build a retroactive reasoning paradigm.<n>RetroRAG revises and updates the evidence, redirecting the reasoning chain to the correct direction.<n> Empirical evaluations show that RetroRAG significantly outperforms existing methods.
arXiv Detail & Related papers (2025-01-07T08:57:42Z) - From Relevance to Utility: Evidence Retrieval with Feedback for Fact Verification [118.03466985807331]
We argue that, rather than relevance, for FV we need to focus on the utility that a claim verifier derives from the retrieved evidence.<n>We introduce the feedback-based evidence retriever(FER) that optimize the evidence retrieval process by incorporating feedback from the claim verifier.
arXiv Detail & Related papers (2023-10-18T02:59:38Z) - GERE: Generative Evidence Retrieval for Fact Verification [57.78768817972026]
We propose GERE, the first system that retrieves evidences in a generative fashion.
The experimental results on the FEVER dataset show that GERE achieves significant improvements over the state-of-the-art baselines.
arXiv Detail & Related papers (2022-04-12T03:49:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.