EMULATE: A Multi-Agent Framework for Determining the Veracity of Atomic Claims by Emulating Human Actions
- URL: http://arxiv.org/abs/2505.16576v2
- Date: Mon, 23 Jun 2025 16:58:51 GMT
- Title: EMULATE: A Multi-Agent Framework for Determining the Veracity of Atomic Claims by Emulating Human Actions
- Authors: Spencer Hong, Meng Luo, Xinyi Wan,
- Abstract summary: EMULATE is designed to better emulate human actions through the use of a multi-agent framework.<n> experiments on several benchmarks show clear improvements over prior work, demonstrating the efficacy of our new multi-agent framework.
- Score: 0.6144680854063939
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Determining the veracity of atomic claims is an imperative component of many recently proposed fact-checking systems. Many approaches tackle this problem by first retrieving evidence by querying a search engine and then performing classification by providing the evidence set and atomic claim to a large language model, but this process deviates from what a human would do in order to perform the task. Recent work attempted to address this issue by proposing iterative evidence retrieval, allowing for evidence to be collected several times and only when necessary. Continuing along this line of research, we propose a novel claim verification system, called EMULATE, which is designed to better emulate human actions through the use of a multi-agent framework where each agent performs a small part of the larger task, such as ranking search results according to predefined criteria or evaluating webpage content. Extensive experiments on several benchmarks show clear improvements over prior work, demonstrating the efficacy of our new multi-agent framework.
Related papers
- Reinforcing Compositional Retrieval: Retrieving Step-by-Step for Composing Informative Contexts [67.67746334493302]
Large Language Models (LLMs) have demonstrated remarkable capabilities across numerous tasks, yet they often rely on external context to handle complex tasks.<n>We propose a tri-encoder sequential retriever that models this process as a Markov Decision Process (MDP)<n>We show that our method consistently and significantly outperforms baselines, underscoring the importance of explicitly modeling inter-example dependencies.
arXiv Detail & Related papers (2025-04-15T17:35:56Z) - MAMM-Refine: A Recipe for Improving Faithfulness in Generation with Multi-Agent Collaboration [63.31211701741323]
We extend multi-agent multi-model reasoning to generation, specifically to improving faithfulness through refinement.<n>We design intrinsic evaluations for each subtask, with our findings indicating that both multi-agent (multiple instances) and multi-model (diverse LLM types) approaches benefit error detection and critiquing.<n>We consolidate these insights into a final "recipe" called Multi-Agent Multi-Model Refinement (MAMM-Refine), where multi-agent and multi-model collaboration significantly boosts performance.
arXiv Detail & Related papers (2025-03-19T14:46:53Z) - MultiConIR: Towards multi-condition Information Retrieval [57.6405602406446]
We introduce MultiConIR, the first benchmark designed to evaluate retrieval models in multi-condition scenarios.<n>We propose three tasks to assess retrieval and reranking models on multi-condition robustness, monotonic relevance ranking, and query format sensitivity.
arXiv Detail & Related papers (2025-03-11T05:02:03Z) - A Multi-Agent Perspective on Modern Information Retrieval [12.228832858396368]
The rise of large language models (LLMs) has introduced a new era in information retrieval (IR)<n>This shift challenges some long-standing IR paradigms and calls for a reassessment of both theoretical frameworks and practical methodologies.<n>We advocate for a multi-agent perspective to better capture the complex interactions between query agents, document agents, and ranker agents.
arXiv Detail & Related papers (2025-02-20T18:17:26Z) - Options-Aware Dense Retrieval for Multiple-Choice query Answering [5.098112872671412]
Long-context multiple-choice question answering tasks require robust reasoning over extensive text sources.<n>Prior research in this domain has predominantly utilized pre-trained dense retrieval models.<n>This paper proposes a novel method called Options Aware Dense Retrieval (OADR) to address these challenges.
arXiv Detail & Related papers (2025-01-27T15:03:26Z) - How Does Generative Retrieval Scale to Millions of Passages? [68.98628807288972]
We conduct the first empirical study of generative retrieval techniques across various corpus scales.
We scale generative retrieval to millions of passages with a corpus of 8.8M passages and evaluating model sizes up to 11B parameters.
While generative retrieval is competitive with state-of-the-art dual encoders on small corpora, scaling to millions of passages remains an important and unsolved challenge.
arXiv Detail & Related papers (2023-05-19T17:33:38Z) - Recommender Systems with Generative Retrieval [58.454606442670034]
We propose a novel generative retrieval approach, where the retrieval model autoregressively decodes the identifiers of the target candidates.
To that end, we create semantically meaningful of codewords to serve as a Semantic ID for each item.
We show that recommender systems trained with the proposed paradigm significantly outperform the current SOTA models on various datasets.
arXiv Detail & Related papers (2023-05-08T21:48:17Z) - GERE: Generative Evidence Retrieval for Fact Verification [57.78768817972026]
We propose GERE, the first system that retrieves evidences in a generative fashion.
The experimental results on the FEVER dataset show that GERE achieves significant improvements over the state-of-the-art baselines.
arXiv Detail & Related papers (2022-04-12T03:49:35Z) - Deep Reinforcement Agent for Efficient Instant Search [14.086339486783018]
We propose to address the load issue by identifying tokens that are semantically more salient towards retrieving relevant documents.
We train a reinforcement agent that interacts directly with the search engine and learns to predict the word's importance.
A novel evaluation framework is presented to study the trade-off between the number of triggered searches and the system's performance.
arXiv Detail & Related papers (2022-03-17T22:47:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.