O$^2$-Searcher: A Searching-based Agent Model for Open-Domain Open-Ended Question Answering
- URL: http://arxiv.org/abs/2505.16582v2
- Date: Mon, 26 May 2025 10:07:05 GMT
- Title: O$^2$-Searcher: A Searching-based Agent Model for Open-Domain Open-Ended Question Answering
- Authors: Jianbiao Mei, Tao Hu, Daocheng Fu, Licheng Wen, Xuemeng Yang, Rong Wu, Pinlong Cai, Xinyu Cai, Xing Gao, Yu Yang, Chengjun Xie, Botian Shi, Yong Liu, Yu Qiao,
- Abstract summary: O$2$-Searcher is a novel search agent leveraging reinforcement learning to tackle both open-ended and closed-ended questions in the open domain.<n>It employs a unified training mechanism with meticulously designed reward functions, enabling the agent to identify problem types and adapt different answer generation strategies.<n>Extensive experiments show that O$2$-Searcher, using only a 3B model, significantly surpasses leading LLM agents on O$2$-QA.
- Score: 31.38063794496179
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models (LLMs), despite their advancements, are fundamentally limited by their static parametric knowledge, hindering performance on tasks requiring open-domain up-to-date information. While enabling LLMs to interact with external knowledge environments is a promising solution, current efforts primarily address closed-end problems. Open-ended questions, which characterized by lacking a standard answer or providing non-unique and diverse answers, remain underexplored. To bridge this gap, we present O$^2$-Searcher, a novel search agent leveraging reinforcement learning to effectively tackle both open-ended and closed-ended questions in the open domain. O$^2$-Searcher leverages an efficient, locally simulated search environment for dynamic knowledge acquisition, effectively decoupling the external world knowledge from model's sophisticated reasoning processes. It employs a unified training mechanism with meticulously designed reward functions, enabling the agent to identify problem types and adapt different answer generation strategies. Furthermore, to evaluate performance on complex open-ended tasks, we construct O$^2$-QA, a high-quality benchmark featuring 300 manually curated, multi-domain open-ended questions with associated web page caches. Extensive experiments show that O$^2$-Searcher, using only a 3B model, significantly surpasses leading LLM agents on O$^2$-QA. It also achieves SOTA results on various closed-ended QA benchmarks against similarly-sized models, while performing on par with much larger ones.
Related papers
- MMSearch-R1: Incentivizing LMMs to Search [49.889749277236376]
We present MMSearch-R1, the first end-to-end reinforcement learning framework that enables on-demand, multi-turn search in real-world Internet environments.<n>Our framework integrates both image and text search tools, allowing the model to reason about when and how to invoke them guided by an outcome-based reward with a search penalty.
arXiv Detail & Related papers (2025-06-25T17:59:42Z) - Scaling External Knowledge Input Beyond Context Windows of LLMs via Multi-Agent Collaboration [38.63721941742435]
Large language models (LLMs) can incorporate a large quantity of retrieved knowledge to solve complex tasks.<n>Existing context window extension methods inevitably cause information loss.<n>We develop a multi-agent framework, $textbfExtAgents$, to overcome the bottlenecks.
arXiv Detail & Related papers (2025-05-27T17:45:04Z) - $C^3$-Bench: The Things Real Disturbing LLM based Agent in Multi-Tasking [12.218102495632937]
We present an open-source benchmark $C3$-Bench to assess agent robustness.<n>In concrete, we design three challenges: navigate complex tool relationships, handle critical hidden information and manage dynamic decision paths.<n>In essence, $C3$-Bench aims to expose model vulnerabilities through these challenges and drive research into the interpretability of agent performance.
arXiv Detail & Related papers (2025-05-24T15:25:44Z) - Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent [92.57125498367907]
Multimodal Retrieval Augmented Generation (mRAG) plays an important role in mitigating the "hallucination" issue inherent in multimodal large language models (MLLMs)<n>We propose the first self-adaptive planning agent for multimodal retrieval, OmniSearch.
arXiv Detail & Related papers (2024-11-05T09:27:21Z) - Unleashing LLM Reasoning Capability via Scalable Question Synthesis from Scratch [54.12139707822201]
We propose ScaleQuest, a novel, scalable, and cost-effective data synthesis method.<n>By generating diverse questions from scratch, we produce a dataset of 1 million problem-solution pairs.<n>Our experiments demonstrate that models trained on our data outperform existing open-source datasets.
arXiv Detail & Related papers (2024-10-24T12:42:04Z) - DEXTER: A Benchmark for open-domain Complex Question Answering using LLMs [3.24692739098077]
Open-domain complex Question Answering (QA) is a difficult task with challenges in evidence retrieval and reasoning.
We evaluate state-of-the-art pre-trained dense and sparse retrieval models in an open-domain setting.
We observe that late interaction models and surprisingly lexical models like BM25 perform well compared to other pre-trained dense retrieval models.
arXiv Detail & Related papers (2024-06-24T22:09:50Z) - Multi-LLM QA with Embodied Exploration [55.581423861790945]
We investigate the use of Multi-Embodied LLM Explorers (MELE) for question-answering in an unknown environment.
Multiple LLM-based agents independently explore and then answer queries about a household environment.
We analyze different aggregation methods to generate a single, final answer for each query.
arXiv Detail & Related papers (2024-06-16T12:46:40Z) - Enhancing Open-Domain Task-Solving Capability of LLMs via Autonomous Tool Integration from GitHub [79.31134731122462]
We introduce OpenAct benchmark to evaluate the open-domain task-solving capability, built on human expert consultation and repositories in GitHub.<n>We present OpenAgent, a novel LLM-based agent system that can tackle evolving queries in open domains through autonomously integrating specialized tools from GitHub.
arXiv Detail & Related papers (2023-12-28T15:47:30Z) - OpenAGI: When LLM Meets Domain Experts [51.86179657467822]
Human Intelligence (HI) excels at combining basic skills to solve complex tasks.
This capability is vital for Artificial Intelligence (AI) and should be embedded in comprehensive AI Agents.
We introduce OpenAGI, an open-source platform designed for solving multi-step, real-world tasks.
arXiv Detail & Related papers (2023-04-10T03:55:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.