MaskSearch: A Universal Pre-Training Framework to Enhance Agentic Search Capability
- URL: http://arxiv.org/abs/2505.20285v2
- Date: Tue, 27 May 2025 06:46:24 GMT
- Title: MaskSearch: A Universal Pre-Training Framework to Enhance Agentic Search Capability
- Authors: Weiqi Wu, Xin Guan, Shen Huang, Yong Jiang, Pengjun Xie, Fei Huang, Jiuxin Cao, Hai Zhao, Jingren Zhou,
- Abstract summary: Recent advancements in Agent techniques enable Large Language Models (LLMs) to autonomously utilize tools for retrieval, planning, and reasoning.<n>To further enhance the universal search capability of agents, we propose a novel pre-training framework, MaskSearch.<n>In the pre-training stage, we introduce the Retrieval Augmented Mask Prediction (RAMP) task, where the model learns to leverage search tools to fill masked spans.<n>After that, the model is trained on downstream tasks to achieve further improvement.
- Score: 106.35604230971396
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Retrieval-Augmented Language Models (RALMs) represent a classic paradigm where models enhance generative capabilities using external knowledge retrieved via a specialized module. Recent advancements in Agent techniques enable Large Language Models (LLMs) to autonomously utilize tools for retrieval, planning, and reasoning. While existing training-based methods show promise, their agentic abilities are limited by inherent characteristics of the task-specific data used during training. To further enhance the universal search capability of agents, we propose a novel pre-training framework, MaskSearch. In the pre-training stage, we introduce the Retrieval Augmented Mask Prediction (RAMP) task, where the model learns to leverage search tools to fill masked spans on a large number of pre-training data, thus acquiring universal retrieval and reasoning capabilities for LLMs. After that, the model is trained on downstream tasks to achieve further improvement. We apply both Supervised Fine-tuning (SFT) and Reinforcement Learning (RL) for training. For SFT, we combine agent-based and distillation-based methods to generate training data, starting with a multi-agent system consisting of a planner, rewriter, observer, and followed by a self-evolving teacher model. While for RL, we employ DAPO as the training framework and adopt a hybrid reward system consisting of answer rewards and format rewards. Additionally, we introduce a curriculum learning approach that allows the model to learn progressively from easier to more challenging instances based on the number of masked spans. We evaluate the effectiveness of our framework in the scenario of open-domain multi-hop question answering. Through extensive experiments, we demonstrate that MaskSearch significantly enhances the performance of LLM-based search agents on both in-domain and out-of-domain downstream tasks.
Related papers
- LAM SIMULATOR: Advancing Data Generation for Large Action Model Training via Online Exploration and Trajectory Feedback [121.78866929908871]
Large Action Models (LAMs) for AI Agents offer incredible potential but face challenges due to the need for high-quality training data.<n>We present LAM SIMULATOR, a comprehensive framework designed for online exploration of agentic tasks with high-quality feedback.<n>Our framework features a dynamic task query generator, an extensive collection of tools, and an interactive environment where Large Language Model (LLM) Agents can call tools and receive real-time feedback.
arXiv Detail & Related papers (2025-06-02T22:36:02Z) - LaMDAgent: An Autonomous Framework for Post-Training Pipeline Optimization via LLM Agents [3.6117068575553595]
We introduce LaMDAgent, a framework that autonomously constructs and optimize full post-training pipelines.<n>LaMDAgent improves tool-use accuracy by 9.0 points while preserving instruction-following capabilities.<n>It uncovers effective post-training strategies that are often overlooked by conventional human-driven exploration.
arXiv Detail & Related papers (2025-05-28T04:30:51Z) - SEM: Reinforcement Learning for Search-Efficient Large Language Models [26.075903427834838]
Large Language Models (LLMs) have demonstrated their capabilities not only in reasoning but also in invoking external tools.<n>Existing reinforcement learning approaches often lead to redundant search behaviors, resulting in inefficiencies and over-cost.<n>We propose SEM, a novel post-training reinforcement learning framework that explicitly trains LLMs to optimize search usage.
arXiv Detail & Related papers (2025-05-12T09:45:40Z) - From Multimodal LLMs to Generalist Embodied Agents: Methods and Lessons [85.99268361356832]
We introduce a process of adapting an MLLM to a Generalist Embodied Agent (GEA)<n>GEA is a single unified model capable of grounding itself across varied domains through a multi-embodiment action tokenizer.<n>Our findings reveal the importance of training with cross-domain data and online RL for building generalist agents.
arXiv Detail & Related papers (2024-12-11T15:06:25Z) - MALT: Improving Reasoning with Multi-Agent LLM Training [66.9481561915524]
MALT (Multi-Agent LLM Training) is a novel post-training strategy that divides the reasoning process into generation, verification, and refinement steps.<n>On MATH, GSM8K, and CSQA, MALT surpasses the same baseline LLM with a relative improvement of 15.66%, 7.42%, and 9.40% respectively.
arXiv Detail & Related papers (2024-12-02T19:30:36Z) - LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant [63.28378110792787]
We introduce LamRA, a versatile framework designed to empower Large Multimodal Models with sophisticated retrieval and reranking capabilities.<n>For retrieval, we adopt a two-stage training strategy comprising language-only pre-training and multimodal instruction tuning.<n>For reranking, we employ joint training for both pointwise and listwise reranking, offering two distinct ways to further boost the retrieval performance.
arXiv Detail & Related papers (2024-12-02T17:10:16Z) - RA-BLIP: Multimodal Adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training [55.54020926284334]
Multimodal Large Language Models (MLLMs) have recently received substantial interest, which shows their emerging potential as general-purpose models for various vision-language tasks.
Retrieval augmentation techniques have proven to be effective plugins for both LLMs and MLLMs.
In this study, we propose multimodal adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training (RA-BLIP), a novel retrieval-augmented framework for various MLLMs.
arXiv Detail & Related papers (2024-10-18T03:45:19Z) - Large Language Models as Foundations for Next-Gen Dense Retrieval: A Comprehensive Empirical Assessment [16.39696580487218]
Pretrained language models like BERT and T5 serve as crucial backbone encoders for dense retrieval.
Recent research has explored using large language models (LLMs) as retrievers, achieving SOTA performance across various tasks.
arXiv Detail & Related papers (2024-08-22T08:16:07Z) - ZhichunRoad at Amazon KDD Cup 2022: MultiTask Pre-Training for
E-Commerce Product Search [4.220439000486713]
We propose a robust multilingual model to improve the quality of search results.
In pre-training stage, we adopt mlm task, classification task and contrastive learning task.
In fine-tuning stage, we use confident learning, exponential moving average method (EMA), adversarial training (FGM) and regularized dropout strategy (R-Drop)
arXiv Detail & Related papers (2023-01-31T07:31:34Z) - Multitask Adaptation by Retrospective Exploration with Learned World
Models [77.34726150561087]
We propose a meta-learned addressing model called RAMa that provides training samples for the MBRL agent taken from task-agnostic storage.
The model is trained to maximize the expected agent's performance by selecting promising trajectories solving prior tasks from the storage.
arXiv Detail & Related papers (2021-10-25T20:02:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.