Related papers: COMPASS: Enhancing Agent Long-Horizon Reasoning with Evolving Context

COMPASS: Enhancing Agent Long-Horizon Reasoning with Evolving Context

URL: http://arxiv.org/abs/2510.08790v1
Date: Thu, 09 Oct 2025 20:14:26 GMT
Title: COMPASS: Enhancing Agent Long-Horizon Reasoning with Evolving Context
Authors: Guangya Wan, Mingyang Ling, Xiaoqi Ren, Rujun Han, Sheng Li, Zizhao Zhang,
Abstract summary: Small errors compound across steps, and even state-of-the-art models often hallucinate or lose coherence.<n>We propose a lightweight hierarchical framework that separates tactical execution, strategic oversight, and context organization into three specialized components.
Score: 17.575806280348797
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Long-horizon tasks that require sustained reasoning and multiple tool interactions remain challenging for LLM agents: small errors compound across steps, and even state-of-the-art models often hallucinate or lose coherence. We identify context management as the central bottleneck -- extended histories cause agents to overlook critical evidence or become distracted by irrelevant information, thus failing to replan or reflect from previous mistakes. To address this, we propose COMPASS (Context-Organized Multi-Agent Planning and Strategy System), a lightweight hierarchical framework that separates tactical execution, strategic oversight, and context organization into three specialized components: (1) a Main Agent that performs reasoning and tool use, (2) a Meta-Thinker that monitors progress and issues strategic interventions, and (3) a Context Manager that maintains concise, relevant progress briefs for different reasoning stages. Across three challenging benchmarks -- GAIA, BrowseComp, and Humanity's Last Exam -- COMPASS improves accuracy by up to 20% relative to both single- and multi-agent baselines. We further introduce a test-time scaling extension that elevates performance to match established DeepResearch agents, and a post-training pipeline that delegates context management to smaller models for enhanced efficiency.

Related papers

Laser: Governing Long-Horizon Agentic Search via Structured Protocol and Context Register [38.329346729947304]
We introduce Laser, a framework for stabilizing and scaling agentic search.<n>Laser organizes agent behaviors into three spaces: planning, task-solving, and retrospection.<n>Laser consistently outperforms existing agentic search baselines under both prompting-only and fine-tuning settings.
arXiv Detail & Related papers (2025-12-23T15:53:33Z)
Adaptive Multi-Agent Reasoning for Text-to-Video Retrieval [12.701443847087164]
We propose an adaptive multi-agent retrieval framework that orchestrates specialized agents over multiple reasoning iterations.<n>Our framework achieves a twofold improvement over CLIP4Clip and significantly outperforms state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2025-12-02T09:52:51Z)
PRInTS: Reward Modeling for Long-Horizon Information Seeking [74.14496236655911]
We introduce PRInTS, a generative PRM trained with dual capabilities.<n>We show that PRInTS enhances information-seeking abilities of open-source models as well as specialized agents.
arXiv Detail & Related papers (2025-11-24T17:09:43Z)
AgentFold: Long-Horizon Web Agents with Proactive Context Management [98.54523771369018]
LLM-based web agents show immense promise for information seeking, yet their effectiveness is hindered by a fundamental trade-off in context management.<n>We introduce AgentFold, a novel agent paradigm centered on proactive context management.<n>With simple supervised fine-tuning, our AgentFold-30B-A3B agent achieves 36.2% on BrowseComp and 47.3% on BrowseComp-ZH.
arXiv Detail & Related papers (2025-10-28T17:51:50Z)
Benefits and Limitations of Communication in Multi-Agent Reasoning [11.788489289062312]
We propose a theoretical framework to analyze the expressivity of multi-agent systems.<n>We derive bounds on (i) the number of agents required to solve the task exactly, (ii) the quantity and structure of inter-agent communication, and (iii) the achievable speedups as problem size and context scale.<n>Our results identify regimes where communication is provably beneficial, delineate tradeoffs between agent count and bandwidth, and expose intrinsic limitations when either resource is constrained.
arXiv Detail & Related papers (2025-10-14T20:04:27Z)
Reasoning-Aware Prompt Orchestration: A Foundation Model for Multi-Agent Language Model Coordination [0.0]
We present a theoretically-grounded framework for dynamic prompt orchestration that enhances reasoning across multiple specialized agents.<n>This framework addresses three core challenges: logical consistency preservation during agent transitions, reasoning-aware prompt adaptation, and scalable coordination of distributed inference.<n> Experimental results on 1,000 synthetic multi-agent conversations demonstrate a 42% reduction in reasoning latency, a 23% improvement in logical consistency measured by ROUGE-L score, and an 89% success rate for task completion without context loss.
arXiv Detail & Related papers (2025-09-30T22:33:01Z)
UltraHorizon: Benchmarking Agent Capabilities in Ultra Long-Horizon Scenarios [63.67884284105684]
We introduce textbfUltraHorizon, a novel benchmark that measures the foundational capabilities essential for complex real-world challenges.<n>Agents are designed in long-horizon discovery tasks where they must iteratively uncover hidden rules.<n>Our experiments reveal that LLM-agents consistently underperform in these settings, whereas human participants achieve higher scores.
arXiv Detail & Related papers (2025-09-26T02:04:00Z)
Visual Document Understanding and Question Answering: A Multi-Agent Collaboration Framework with Test-Time Scaling [83.78874399606379]
We propose MACT, a Multi-Agent Collaboration framework with Test-Time scaling.<n>It comprises four distinct small-scale agents, with clearly defined roles and effective collaboration.<n>It shows superior performance with a smaller parameter scale without sacrificing the ability of general and mathematical tasks.
arXiv Detail & Related papers (2025-08-05T12:52:09Z)
Unifying Language Agent Algorithms with Graph-based Orchestration Engine for Reproducible Agent Research [32.92036657863354]
Language agents powered by large language models (LLMs) have demonstrated remarkable capabilities in understanding, reasoning, and executing complex tasks.<n>However, developing robust agents presents significant challenges: substantial engineering overhead, lack of standardized components, and insufficient evaluation frameworks for fair comparison.<n>We introduce Agent Graph-based Orchestration for Reasoning and Assessment (AGORA), a flexible and abstraction framework that addresses these challenges.
arXiv Detail & Related papers (2025-05-30T08:46:23Z)
ReMA: Learning to Meta-think for LLMs with Multi-Agent Reinforcement Learning [53.817538122688944]
We introduce Reinforced Meta-thinking Agents (ReMA) to elicit meta-thinking behaviors from Reasoning of Large Language Models (LLMs)<n>ReMA decouples the reasoning process into two hierarchical agents: a high-level meta-thinking agent responsible for generating strategic oversight and plans, and a low-level reasoning agent for detailed executions.<n> Empirical results from single-turn experiments demonstrate that ReMA outperforms single-agent RL baselines on complex reasoning tasks.
arXiv Detail & Related papers (2025-03-12T16:05:31Z)
Self-Taught Agentic Long Context Understanding [47.186303525057475]
AgenticLU is a framework designed to integrate targeted self-clarification with contextual grounding.<n>AgenticLU achieves 97.8% answer recall on NarrativeQA with a search depth of up to three and a branching factor of eight.
arXiv Detail & Related papers (2025-02-21T20:29:36Z)
Textualized Agent-Style Reasoning for Complex Tasks by Multiple Round LLM Generation [49.27250832754313]
We present AgentCOT, a llm-based autonomous agent framework. At each step, AgentCOT selects an action and executes it to yield an intermediate result with supporting evidence. We introduce two new strategies to enhance the performance of AgentCOT.
arXiv Detail & Related papers (2024-09-19T02:20:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.