Related papers: CODA: Coordinating the Cerebrum and Cerebellum for a Dual-Brain Computer Use Agent with Decoupled Reinforcement Learning

CODA: Coordinating the Cerebrum and Cerebellum for a Dual-Brain Computer Use Agent with Decoupled Reinforcement Learning

URL: http://arxiv.org/abs/2508.20096v1
Date: Wed, 27 Aug 2025 17:59:50 GMT
Title: CODA: Coordinating the Cerebrum and Cerebellum for a Dual-Brain Computer Use Agent with Decoupled Reinforcement Learning
Authors: Zeyi Sun, Yuhang Cao, Jianze Liang, Qiushi Sun, Ziyu Liu, Zhixiong Zhang, Yuhang Zang, Xiaoyi Dong, Kai Chen, Dahua Lin, Jiaqi Wang,
Abstract summary: Existing approaches suffer from a trade-off: generalist agents excel at planning but perform poorly in execution, while specialized agents demonstrate the opposite weakness.<n>Recent compositional frameworks attempt to bridge this gap by combining a planner and an actor, but they are typically static and non-trainable.<n>We introduce CODA, a novel and trainable compositional framework that integrates a generalist planner with a specialist executor.
Score: 81.08755597239262
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Autonomous agents for Graphical User Interfaces (GUIs) face significant challenges in specialized domains such as scientific computing, where both long-horizon planning and precise execution are required. Existing approaches suffer from a trade-off: generalist agents excel at planning but perform poorly in execution, while specialized agents demonstrate the opposite weakness. Recent compositional frameworks attempt to bridge this gap by combining a planner and an actor, but they are typically static and non-trainable, which prevents adaptation from experience. This is a critical limitation given the scarcity of high-quality data in scientific domains. To address these limitations, we introduce CODA, a novel and trainable compositional framework that integrates a generalist planner (Cerebrum) with a specialist executor (Cerebellum), trained via a dedicated two-stage pipeline. In the first stage, Specialization, we apply a decoupled GRPO approach to train an expert planner for each scientific application individually, bootstrapping from a small set of task trajectories. In the second stage, Generalization, we aggregate all successful trajectories from the specialized experts to build a consolidated dataset, which is then used for supervised fine-tuning of the final planner. This equips CODA with both robust execution and cross-domain generalization. Evaluated on four challenging applications from the ScienceBoard benchmark, CODA significantly outperforms baselines and establishes a new state of the art among open-source models.

Related papers

K^2-Agent: Co-Evolving Know-What and Know-How for Hierarchical Mobile Device Control [73.50217471850658]
K2-Agent is a hierarchical framework that models human-like cognition by knowing and co-evolving declarative (what) and procedural (how) knowledge for planning and execution.<n>On the challenging AndroidWorld benchmark, K2-Agent achieves a 76.1% success rate using only raw and open-source backbones.
arXiv Detail & Related papers (2026-02-28T14:33:14Z)
MagicAgent: Towards Generalized Agent Planning [73.21129030631421]
We present textbfMagicAgent, a series of foundation models specifically designed for generalized agent planning.<n>We introduce a lightweight and scalable synthetic data framework that generates high-quality trajectories across diverse planning tasks.<n>We show that MagicAgent-32B and MagicAgent-30B-A3B achieve superior performance across diverse open-source benchmarks.
arXiv Detail & Related papers (2026-02-22T01:39:16Z)
OFA-MAS: One-for-All Multi-Agent System Topology Design based on Mixture-of-Experts Graph Generative Models [57.94189874119267]
Multi-Agent Systems (MAS) offer a powerful paradigm for solving complex problems.<n>Current graph learning-based design methodologies often adhere to a "one-for-one" paradigm.<n>We propose OFA-TAD, a one-for-all framework that generates adaptive collaboration graphs for any task described in natural language.
arXiv Detail & Related papers (2026-01-19T12:23:44Z)
Token-Level LLM Collaboration via FusionRoute [60.72307345997823]
FusionRoute is a token-level multi-LLM collaboration framework.<n>It selects the most suitable expert at each decoding step and contributes a complementary logit that refines or corrects the selected expert's next-token distribution.<n>It outperforms both sequence- and token-level collaboration, model merging, and direct fine-tuning.
arXiv Detail & Related papers (2026-01-08T16:53:16Z)
GMoPE:A Prompt-Expert Mixture Framework for Graph Foundation Models [30.023472202549076]
Graph Neural Networks (GNNs) have demonstrated impressive performance on task-specific benchmarks, yet their ability to generalize across diverse domains and tasks remains limited.<n>We propose GMoPE, a framework that seamlessly integrates the Mixture-of-Experts (MoE) architecture with prompt-based learning for graphs.<n>We show that GMoPE consistently outperforms state-of-the-art baselines and achieves performance comparable to full parameter fine-tuning.
arXiv Detail & Related papers (2025-11-05T07:28:51Z)
DUET: Dual Model Co-Training for Entire Space CTR Prediction [34.35929309131385]
textbfDUET (textbfDUal Model Co-Training for textbfDUal Model Co-Training for textbfEntire Space CtextbfTR Prediction) is a set-wise pre-ranking framework that achieves expressive modeling under tight computational budgets.<n>It consistently outperforms state-of-the-art baselines and achieves improvements across multiple core business metrics.
arXiv Detail & Related papers (2025-10-28T12:46:33Z)
Generalist++: A Meta-learning Framework for Mitigating Trade-off in Adversarial Training [105.74524789405514]
adversarial training (AT) is currently the most effective defense against neural networks.<n>We propose to partition the overall generalization goal into multiple sub-tasks, each assigned to a dedicated base learner.<n>In the later stages of training, we interpolate their parameters to form a knowledgeable global learner.<n>We term this framework Generalist and introduce three variants tailored to different application scenarios.
arXiv Detail & Related papers (2025-10-15T09:47:54Z)
COME: Dual Structure-Semantic Learning with Collaborative MoE for Universal Lesion Detection Across Heterogeneous Ultrasound Datasets [25.82307075214309]
We propose a Universal Collaborative Mixture of Heterogeneous Source-Specific Experts (COME)<n>COME establishes dual structure-semantic shared experts that create a universal representation space and then collaborate with source-specific experts to extract discriminative features.<n>This design enables robust generalization by leveraging cross-datasets experience distributions and providing universal US priors for small-batch or unseen data scenarios.
arXiv Detail & Related papers (2025-08-13T15:43:20Z)
Predicting Multi-Agent Specialization via Task Parallelizability [8.465921582175426]
We present a closed-form bound that predicts when specialization improves performance depending on task regimes and team size.<n>We validate our model on two standard MARL benchmarks that represent opposite benchmarks.<n>Three follow-up experiments in Overcooked-AI demonstrate that the model works in environments with more complex spatial and resource bottlenecks.
arXiv Detail & Related papers (2025-03-19T21:33:48Z)
HG-Adapter: Improving Pre-Trained Heterogeneous Graph Neural Networks with Dual Adapters [53.97380482341493]
"pre-train, prompt-tuning" has demonstrated impressive performance for tuning pre-trained heterogeneous graph neural networks (HGNNs) We propose a unified framework that combines two new adapters with potential labeled data extension to improve the generalization of pre-trained HGNN models.
arXiv Detail & Related papers (2024-11-02T06:43:54Z)
Parallel Strategies for Best-First Generalized Planning [51.713634067802104]
Generalized planning (GP) is a research area of AI that studies the automated synthesis of algorithmic-like solutions capable of solving multiple classical planning instances. One of the current advancements has been the introduction of Best-First Generalized Planning (BFGP), a GP algorithm based on a novel solution space that can be explored with search. This paper evaluates the application of parallel search techniques to BFGP, another critical component in closing the performance gap.
arXiv Detail & Related papers (2024-07-31T09:50:22Z)
The Overcooked Generalisation Challenge [8.131038178603873]
We introduce the Overcooked Generalisation Challenge (OGC)<n>It is the first benchmark to study agents' zero-shot cooperation abilities when faced with novel partners and levels in the Overcooked-AI environment.<n>Our challenge interfaces with state-of-the-art dual curriculum design (DCD) methods to generate auto-curricula for training general agents in Overcooked.
arXiv Detail & Related papers (2024-06-25T21:51:43Z)
Mastering Text, Code and Math Simultaneously via Fusing Highly Specialized Language Models [93.92762966380793]
Large language models (LLMs) strive to achieve high performance across all three domains simultaneously. In this paper, we propose to fuse models that are already highly-specialized directly. The proposed fusing framework, UltraFuser, consists of three distinct specialists that are already sufficiently trained on language, coding, and mathematics.
arXiv Detail & Related papers (2024-03-13T06:18:48Z)
Enhancing Compositional Generalization via Compositional Feature Alignment [14.289836081158615]
We develop CG-Bench, a suite of CG benchmarks derived from existing real-world image datasets. We propose Compositional Feature Alignment (CFA), a simple two-stage finetuning technique. We conduct experiments on CG-Bench for CLIP and DINOv2, two powerful pretrained vision foundation models.
arXiv Detail & Related papers (2024-02-05T10:06:24Z)
Personalizing Federated Learning with Over-the-Air Computations [84.8089761800994]
Federated edge learning is a promising technology to deploy intelligence at the edge of wireless networks in a privacy-preserving manner. Under such a setting, multiple clients collaboratively train a global generic model under the coordination of an edge server. This paper presents a distributed training paradigm that employs analog over-the-air computation to address the communication bottleneck.
arXiv Detail & Related papers (2023-02-24T08:41:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.