Web Retrieval Agents for Evidence-Based Misinformation Detection
- URL: http://arxiv.org/abs/2409.00009v2
- Date: Wed, 9 Oct 2024 19:13:41 GMT
- Title: Web Retrieval Agents for Evidence-Based Misinformation Detection
- Authors: Jacob-Junqi Tian, Hao Yu, Yury Orlovskiy, Tyler Vergho, Mauricio Rivera, Mayank Goel, Zachary Yang, Jean-Francois Godbout, Reihaneh Rabbany, Kellin Pelrine,
- Abstract summary: This paper develops an agent-based automated fact-checking approach for detecting misinformation.
We demonstrate that combining a powerful LLM agent, which does not have access to the internet for searches, with an online web search agent yields better results than when each tool is used independently.
- Score: 12.807650005708911
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper develops an agent-based automated fact-checking approach for detecting misinformation. We demonstrate that combining a powerful LLM agent, which does not have access to the internet for searches, with an online web search agent yields better results than when each tool is used independently. Our approach is robust across multiple models, outperforming alternatives and increasing the macro F1 of misinformation detection by as much as 20 percent compared to LLMs without search. We also conduct extensive analyses on the sources our system leverages and their biases, decisions in the construction of the system like the search tool and the knowledge base, the type of evidence needed and its impact on the results, and other parts of the overall process. By combining strong performance with in-depth understanding, we hope to provide building blocks for future search-enabled misinformation mitigation systems.
Related papers
- Toward Verifiable Misinformation Detection: A Multi-Tool LLM Agent Framework [0.5999777817331317]
This research proposes an innovative verifiable misinformation detection LLM agent.<n>The agent actively verifies claims through dynamic interaction with diverse web sources.<n>It assesses information source credibility, synthesizes evidence, and provides a complete verifiable reasoning process.
arXiv Detail & Related papers (2025-08-05T05:15:03Z) - DynaSearcher: Dynamic Knowledge Graph Augmented Search Agent via Multi-Reward Reinforcement Learning [4.817888539036794]
DynaSearcher is an innovative search agent enhanced by dynamic knowledge graphs and multi-reward reinforcement learning (RL)<n>We employ a multi-reward RL framework for fine-grained control over training objectives such as retrieval accuracy, efficiency, and response quality.<n> Experimental results demonstrate that our approach achieves state-of-the-art answer accuracy on six multi-hop question answering datasets.
arXiv Detail & Related papers (2025-07-23T09:58:31Z) - MMSearch-R1: Incentivizing LMMs to Search [49.889749277236376]
We present MMSearch-R1, the first end-to-end reinforcement learning framework that enables on-demand, multi-turn search in real-world Internet environments.<n>Our framework integrates both image and text search tools, allowing the model to reason about when and how to invoke them guided by an outcome-based reward with a search penalty.
arXiv Detail & Related papers (2025-06-25T17:59:42Z) - From Web Search towards Agentic Deep Research: Incentivizing Search with Reasoning Agents [96.65646344634524]
Large Language Models (LLMs), endowed with reasoning and agentic capabilities, are ushering in a new paradigm termed Agentic Deep Research.<n>We trace the evolution from static web search to interactive, agent-based systems that plan, explore, and learn.<n>We demonstrate that Agentic Deep Research not only significantly outperforms existing approaches, but is also poised to become the dominant paradigm for future information seeking.
arXiv Detail & Related papers (2025-06-23T17:27:19Z) - Deep Research Agents: A Systematic Examination And Roadmap [79.04813794804377]
Deep Research (DR) agents are designed to tackle complex, multi-turn informational research tasks.<n>In this paper, we conduct a detailed analysis of the foundational technologies and architectural components that constitute DR agents.
arXiv Detail & Related papers (2025-06-22T16:52:48Z) - T^2Agent A Tool-augmented Multimodal Misinformation Detection Agent with Monte Carlo Tree Search [51.91311158085973]
multimodal misinformation often arises from mixed forgery sources, requiring dynamic reasoning and adaptive verification.<n>We propose T2Agent, a novel misinformation detection agent that incorporates a toolkit with Monte Carlo Tree Search.<n>Extensive experiments show that T2Agent consistently outperforms existing baselines on challenging mixed-source multimodal misinformation benchmarks.
arXiv Detail & Related papers (2025-05-26T09:50:55Z) - InfoDeepSeek: Benchmarking Agentic Information Seeking for Retrieval-Augmented Generation [63.55258191625131]
InfoDeepSeek is a new benchmark for assessing agentic information seeking in real-world, dynamic web environments.<n>We propose a systematic methodology for constructing challenging queries satisfying the criteria of determinacy, difficulty, and diversity.<n>We develop the first evaluation framework tailored to dynamic agentic information seeking, including fine-grained metrics about the accuracy, utility, and compactness of information seeking outcomes.
arXiv Detail & Related papers (2025-05-21T14:44:40Z) - The Extractive-Abstractive Spectrum: Uncovering Verifiability Trade-offs in LLM Generations [40.498553309980764]
We study the interplay between verifiability and utility of information-sharing tools.
We find that users prefer search engines over large language models for high-stakes queries.
arXiv Detail & Related papers (2024-11-26T12:34:52Z) - WeKnow-RAG: An Adaptive Approach for Retrieval-Augmented Generation Integrating Web Search and Knowledge Graphs [10.380692079063467]
We propose WeKnow-RAG, which integrates Web search and Knowledge Graphs into a "Retrieval-Augmented Generation (RAG)" system.
First, the accuracy and reliability of LLM responses are improved by combining the structured representation of Knowledge Graphs with the flexibility of dense vector retrieval.
Our approach effectively balances the efficiency and accuracy of information retrieval, thus improving the overall retrieval process.
arXiv Detail & Related papers (2024-08-14T15:19:16Z) - Multimodal Misinformation Detection using Large Vision-Language Models [7.505532091249881]
Large language models (LLMs) have shown remarkable performance in various tasks.
Few approaches consider evidence retrieval as part of misinformation detection.
We propose a novel re-ranking approach for multimodal evidence retrieval.
arXiv Detail & Related papers (2024-07-19T13:57:11Z) - DiscoveryBench: Towards Data-Driven Discovery with Large Language Models [50.36636396660163]
We present DiscoveryBench, the first comprehensive benchmark that formalizes the multi-step process of data-driven discovery.
Our benchmark contains 264 tasks collected across 6 diverse domains, such as sociology and engineering.
Our benchmark, thus, illustrates the challenges in autonomous data-driven discovery and serves as a valuable resource for the community to make progress.
arXiv Detail & Related papers (2024-07-01T18:58:22Z) - Tree Search for Language Model Agents [69.43007235771383]
We propose an inference-time search algorithm for LM agents to perform exploration and multi-step planning in interactive web environments.
Our approach is a form of best-first tree search that operates within the actual environment space.
It is the first tree search algorithm for LM agents that shows effectiveness on realistic web tasks.
arXiv Detail & Related papers (2024-07-01T17:07:55Z) - AvaTaR: Optimizing LLM Agents for Tool Usage via Contrastive Reasoning [93.96463520716759]
Large language model (LLM) agents have demonstrated impressive capabilities in utilizing external tools and knowledge to boost accuracy and hallucinations.
Here, we introduce AvaTaR, a novel and automated framework that optimize an LLM agent to effectively leverage provided tools, improving performance on a given task.
arXiv Detail & Related papers (2024-06-17T04:20:02Z) - Evaluating Robustness of Generative Search Engine on Adversarial Factual Questions [89.35345649303451]
Generative search engines have the potential to transform how people seek information online.
But generated responses from existing large language models (LLMs)-backed generative search engines may not always be accurate.
Retrieval-augmented generation exacerbates safety concerns, since adversaries may successfully evade the entire system.
arXiv Detail & Related papers (2024-02-25T11:22:19Z) - KwaiAgents: Generalized Information-seeking Agent System with Large
Language Models [33.59597020276034]
Humans excel in critical thinking, planning, reflection, and harnessing available tools to interact with and interpret the world.
Recent advancements in large language models (LLMs) suggest that machines might also possess the aforementioned human-like capabilities.
We introduce KwaiAgents, a generalized information-seeking agent system based on LLMs.
arXiv Detail & Related papers (2023-12-08T08:11:11Z) - Boosting Search Engines with Interactive Agents [25.89284695491093]
This paper presents first steps in designing agents that learn meta-strategies for contextual query refinements.
Agents are empowered with simple but effective search operators to exert fine-grained and transparent control over queries and search results.
arXiv Detail & Related papers (2021-09-01T13:11:57Z) - AutoOD: Automated Outlier Detection via Curiosity-guided Search and
Self-imitation Learning [72.99415402575886]
Outlier detection is an important data mining task with numerous practical applications.
We propose AutoOD, an automated outlier detection framework, which aims to search for an optimal neural network model.
Experimental results on various real-world benchmark datasets demonstrate that the deep model identified by AutoOD achieves the best performance.
arXiv Detail & Related papers (2020-06-19T18:57:51Z) - Bias in Multimodal AI: Testbed for Fair Automatic Recruitment [73.85525896663371]
We study how current multimodal algorithms based on heterogeneous sources of information are affected by sensitive elements and inner biases in the data.
We train automatic recruitment algorithms using a set of multimodal synthetic profiles consciously scored with gender and racial biases.
Our methodology and results show how to generate fairer AI-based tools in general, and in particular fairer automated recruitment systems.
arXiv Detail & Related papers (2020-04-15T15:58:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.