PaperScout: An Autonomous Agent for Academic Paper Search with Process-Aware Sequence-Level Policy Optimization
- URL: http://arxiv.org/abs/2601.10029v1
- Date: Thu, 15 Jan 2026 03:21:21 GMT
- Title: PaperScout: An Autonomous Agent for Academic Paper Search with Process-Aware Sequence-Level Policy Optimization
- Authors: Tingyue Pan, Jie Ouyang, Mingyue Cheng, Qingchuan Li, Zirui Liu, Mingfan Pan, Shuo Yu, Qi Liu,
- Abstract summary: PaperScout is an autonomous agent that reformulates paper search as a sequential decision-making process.<n>We introduce Proximal Sequence Policy Optimization (PSPO), a process-aware, sequence-level policy optimization method.<n>Experiments on both synthetic and real-world benchmarks demonstrate that PaperScout significantly outperforms strong workflow-driven and RL baselines in both recall and relevance.
- Score: 11.080060663295072
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Academic paper search is a fundamental task in scientific research, yet most existing approaches rely on rigid, predefined workflows that struggle with complex, conditional queries. To address this limitation, we propose PaperScout, an autonomous agent that reformulates paper search as a sequential decision-making process. Unlike static workflows, PaperScout dynamically decides whether, when, and how to invoke search and expand tools based on accumulated retrieval context. However, training such agents presents a fundamental challenge: standard reinforcement learning methods, typically designed for single-turn tasks, suffer from a granularity mismatch when applied to multi-turn agentic tasks, where token-level optimization diverges from the granularity of sequence-level interactions, leading to noisy credit assignment. We introduce Proximal Sequence Policy Optimization (PSPO), a process-aware, sequence-level policy optimization method that aligns optimization with agent-environment interaction. Comprehensive experiments on both synthetic and real-world benchmarks demonstrate that PaperScout significantly outperforms strong workflow-driven and RL baselines in both recall and relevance, validating the effectiveness of our adaptive agentic framework and optimization strategy.
Related papers
- PaperGuide: Making Small Language-Model Paper-Reading Agents More Efficient [20.72001543887772]
Recent progress in large language models (LLMs) has spurred interest in autonomous agents that can read scientific papers and extract task-relevant information.<n>Most existing approaches rely either on heavily engineered prompting or on a conventional SFT-RL training pipeline.<n>We propose Paper RL, a framework that mitigates these issues by separating high-level planning from fine-grained execution.
arXiv Detail & Related papers (2026-01-19T12:07:51Z) - Policy-Conditioned Policies for Multi-Agent Task Solving [53.67744322553693]
In this work, we propose a paradigm shift that bridges the gap by representing policies as human-interpretable source code.<n>We reformulate the learning problem by utilizing Large Language Models (LLMs) as approximate interpreters.<n>We formalize this process as textitProgrammatic Iterated Best Response (PIBR), an algorithm where the policy code is optimized by textual gradients.
arXiv Detail & Related papers (2025-12-24T07:42:10Z) - WebLeaper: Empowering Efficiency and Efficacy in WebAgent via Enabling Info-Rich Seeking [60.35109192765302]
Information seeking is a core capability that enables autonomous reasoning and decision-making.<n>We propose WebLeaper, a framework for constructing high-coverage IS tasks and generating efficient solution trajectories.<n>Our method consistently achieves improvements in both effectiveness and efficiency over strong baselines.
arXiv Detail & Related papers (2025-10-28T17:51:42Z) - Rethinking Testing for LLM Applications: Characteristics, Challenges, and a Lightweight Interaction Protocol [83.83217247686402]
Large Language Models (LLMs) have evolved from simple text generators into complex software systems that integrate retrieval augmentation, tool invocation, and multi-turn interactions.<n>Their inherent non-determinism, dynamism, and context dependence pose fundamental challenges for quality assurance.<n>This paper decomposes LLM applications into a three-layer architecture: textbftextitSystem Shell Layer, textbftextitPrompt Orchestration Layer, and textbftextitLLM Inference Core.
arXiv Detail & Related papers (2025-08-28T13:00:28Z) - Evolving Prompts In-Context: An Open-ended, Self-replicating Perspective [65.12150411762273]
We show that pruning random demonstrations into seemingly incoherent "gibberish" can remarkably improve performance across diverse tasks.<n>We propose a self-discover prompt optimization framework, PromptQuine, that automatically searches for the pruning strategy by itself using only low-data regimes.
arXiv Detail & Related papers (2025-06-22T07:53:07Z) - Fast and Robust: Task Sampling with Posterior and Diversity Synergies for Adaptive Decision-Makers in Randomized Environments [40.869524679544824]
Posterior and Diversity Synergized Task Sampling (PDTS) is an easy-to-implement method to accommodate fast and robust sequential decision-making.<n>PDTS unlocks the potential of robust active task sampling, significantly improves the zero-shot and few-shot adaptation robustness in challenging tasks, and even accelerates the learning process under certain scenarios.
arXiv Detail & Related papers (2025-04-27T07:27:17Z) - Towards more Contextual Agents: An extractor-Generator Optimization Framework [0.0]
Large Language Model (LLM)-based agents have demonstrated remarkable success in solving complex tasks across a wide range of general-purpose applications.<n>However, their performance often degrades in context-specific scenarios, such as specialized industries or research domains.<n>To address this challenge, our work introduces a systematic approach to enhance the contextual adaptability of LLM-based agents.
arXiv Detail & Related papers (2025-02-18T15:07:06Z) - Hierarchical Decision Making Based on Structural Information Principles [19.82391136775341]
We propose a novel Structural Information principles-based framework, namely SIDM, for hierarchical Decision Making.<n>We present an abstraction mechanism that processes historical state-action trajectories to construct abstract representations of states and actions.<n>We develop a skill-based learning method for single-agent scenarios and a role-based collaboration method for multi-agent scenarios, both of which can flexibly integrate various underlying algorithms for enhanced performance.
arXiv Detail & Related papers (2024-04-15T13:02:00Z) - Multi-Task Off-Policy Learning from Bandit Feedback [54.96011624223482]
We propose a hierarchical off-policy optimization algorithm (HierOPO), which estimates the parameters of the hierarchical model and then acts pessimistically with respect to them.
We prove per-task bounds on the suboptimality of the learned policies, which show a clear improvement over not using the hierarchical model.
Our theoretical and empirical results show a clear advantage of using the hierarchy over solving each task independently.
arXiv Detail & Related papers (2022-12-09T08:26:27Z) - Scalable PAC-Bayesian Meta-Learning via the PAC-Optimal Hyper-Posterior:
From Theory to Practice [54.03076395748459]
A central question in the meta-learning literature is how to regularize to ensure generalization to unseen tasks.
We present a generalization bound for meta-learning, which was first derived by Rothfuss et al.
We provide a theoretical analysis and empirical case study under which conditions and to what extent these guarantees for meta-learning improve upon PAC-Bayesian per-task learning bounds.
arXiv Detail & Related papers (2022-11-14T08:51:04Z) - An Actor-Critic Method for Simulation-Based Optimization [6.261751912603047]
We focus on a simulation-based optimization problem of choosing the best design from the feasible space.
We formulate the sampling process as a policy searching problem and give a solution from the perspective of Reinforcement Learning (RL)
Some experiments are designed to validate the effectiveness of proposed algorithms.
arXiv Detail & Related papers (2021-10-31T09:04:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.