CoDA: A Context-Decoupled Hierarchical Agent with Reinforcement Learning
- URL: http://arxiv.org/abs/2512.12716v1
- Date: Sun, 14 Dec 2025 14:41:29 GMT
- Title: CoDA: A Context-Decoupled Hierarchical Agent with Reinforcement Learning
- Authors: Xuanzhang Liu, Jianglun Feng, Zhuoran Zhuang, Junzhe Zhao, Maofei Que, Jieting Li, Dianlei Wang, Hao Tong, Ye Chen, Pan Li,
- Abstract summary: We introduce CoDA, a reinforcement learning framework that decouples high-level planning from low-level execution.<n>CoDA achieves significant performance improvements over state-of-the-art baselines on complex multi-hop question-answering benchmarks.
- Score: 12.710191300398924
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Model (LLM) agents trained with reinforcement learning (RL) show great promise for solving complex, multi-step tasks. However, their performance is often crippled by "Context Explosion", where the accumulation of long text outputs overwhelms the model's context window and leads to reasoning failures. To address this, we introduce CoDA, a Context-Decoupled hierarchical Agent, a simple but effective reinforcement learning framework that decouples high-level planning from low-level execution. It employs a single, shared LLM backbone that learns to operate in two distinct, contextually isolated roles: a high-level Planner that decomposes tasks within a concise strategic context, and a low-level Executor that handles tool interactions in an ephemeral, isolated workspace. We train this unified agent end-to-end using PECO (Planner-Executor Co-Optimization), a reinforcement learning methodology that applies a trajectory-level reward to jointly optimize both roles, fostering seamless collaboration through context-dependent policy updates. Extensive experiments demonstrate that CoDA achieves significant performance improvements over state-of-the-art baselines on complex multi-hop question-answering benchmarks, and it exhibits strong robustness in long-context scenarios, maintaining stable performance while all other baselines suffer severe degradation, thus further validating the effectiveness of our hierarchical design in mitigating context overload.
Related papers
- Beyond Monolithic Architectures: A Multi-Agent Search and Knowledge Optimization Framework for Agentic Search [56.78490647843876]
Agentic search has emerged as a promising paradigm for complex information seeking by enabling Large Language Models (LLMs) to interleave reasoning with tool use.<n>We propose bfM-ASK, a framework that explicitly decouples agentic search into two complementary roles: Search Behavior Agents, which plan and execute search actions, and Knowledge Management Agents, which aggregate, filter, and maintain a compact internal context.
arXiv Detail & Related papers (2026-01-08T08:13:27Z) - Context as a Tool: Context Management for Long-Horizon SWE-Agents [38.950807465620365]
We propose CAT, a new context management paradigm that elevates context maintenance to a callable tool integrated into the decision-making process of agents.<n> CAT formalizes a structured context workspace consisting of stable task semantics, condensed long-term memory, and high-fidelity short-term interactions.<n>We show that SWE-Compressor reaches a 57.6% solved rate and significantly outperforms ReAct-based agents and static compression baselines.
arXiv Detail & Related papers (2025-12-26T17:15:47Z) - Reason-Plan-ReAct: A Reasoner-Planner Supervising a ReAct Executor for Complex Enterprise Tasks [0.0]
We introduce RP-ReAct, a novel multi-agent approach that decouples strategic planning from low-level execution to achieve superior reliability and efficiency.<n>RP-ReAct consists of a Reasoner Planner Agent (RPA), responsible for planning each sub-step, and one or multiple Proxy-Execution Agent (PEA) that translates sub-steps into concrete tool interactions.<n>We evaluate RP-ReAct, on the challenging, multi-domain ToolQA benchmark using a diverse set of six open-weight reasoning models.
arXiv Detail & Related papers (2025-12-03T08:28:40Z) - Scaling Long-Horizon LLM Agent via Context-Folding [46.685552398338295]
We introduce Context-Folding, a framework that empowers agents to actively manage their working context.<n>An agent can procedurally branch into a sub-trajectory to handle a subtask and then fold it upon completion, collapsing the intermediate steps while retaining a concise summary of the outcome.
arXiv Detail & Related papers (2025-10-13T22:00:58Z) - Scalable In-Context Q-Learning [68.9917436397079]
We propose textbfScalable textbfIn-textbfContext textbfQ-textbfLearning (textbfSICQL) to steer in-context reinforcement learning.<n>textbfSICQL harnesses dynamic programming and world modeling to steer ICRL toward efficient reward and task generalization.
arXiv Detail & Related papers (2025-06-02T04:21:56Z) - Multi-Agent Collaboration via Evolving Orchestration [55.574417128944226]
Large language models (LLMs) have achieved remarkable results across diverse downstream tasks, but their monolithic nature restricts scalability and efficiency in complex problem-solving.<n>We propose a puppeteer-style paradigm for LLM-based multi-agent collaboration, where a centralized orchestrator ("puppeteer") dynamically directs agents ("puppets") in response to evolving task states.<n> Experiments on closed- and open-domain scenarios show that this method achieves superior performance with reduced computational costs.
arXiv Detail & Related papers (2025-05-26T07:02:17Z) - Structured Agent Distillation for Large Language Model [56.38279355868093]
We propose Structured Agent Distillation, a framework that compresses large LLM-based agents into smaller student models.<n>Our method segments trajectories into [REASON] and [ACT] spans, applying segment-specific losses to align each component with the teacher's behavior.<n>Experiments on ALFWorld, HotPotQA-ReAct, and WebShop show that our approach consistently outperforms token-level and imitation learning baselines.
arXiv Detail & Related papers (2025-05-20T02:01:55Z) - Guiding Multi-agent Multi-task Reinforcement Learning by a Hierarchical Framework with Logical Reward Shaping [16.5526277899717]
This study aims to design a multi-agent cooperative algorithm with logic reward shaping.
Experiments have been conducted on various types of tasks in the Minecraft-like environment.
arXiv Detail & Related papers (2024-11-02T09:03:23Z) - Entropy-Regularized Token-Level Policy Optimization for Language Agent Reinforcement [67.1393112206885]
Large Language Models (LLMs) have shown promise as intelligent agents in interactive decision-making tasks.
We introduce Entropy-Regularized Token-level Policy Optimization (ETPO), an entropy-augmented RL method tailored for optimizing LLMs at the token level.
We assess the effectiveness of ETPO within a simulated environment that models data science code generation as a series of multi-step interactive tasks.
arXiv Detail & Related papers (2024-02-09T07:45:26Z) - Hierarchical Reinforcement Learning with Opponent Modeling for
Distributed Multi-agent Cooperation [13.670618752160594]
Deep reinforcement learning (DRL) provides a promising approach for multi-agent cooperation through the interaction of the agents and environments.
Traditional DRL solutions suffer from the high dimensions of multiple agents with continuous action space during policy search.
We propose a hierarchical reinforcement learning approach with high-level decision-making and low-level individual control for efficient policy search.
arXiv Detail & Related papers (2022-06-25T19:09:29Z) - HAVEN: Hierarchical Cooperative Multi-Agent Reinforcement Learning with
Dual Coordination Mechanism [17.993973801986677]
Multi-agent reinforcement learning often suffers from the exponentially larger action space caused by a large number of agents.
We propose a novel value decomposition framework HAVEN based on hierarchical reinforcement learning for the fully cooperative multi-agent problems.
arXiv Detail & Related papers (2021-10-14T10:43:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.