CODA: Coordinating the Cerebrum and Cerebellum for a Dual-Brain Computer Use Agent with Decoupled Reinforcement Learning
- URL: http://arxiv.org/abs/2508.20096v1
- Date: Wed, 27 Aug 2025 17:59:50 GMT
- Title: CODA: Coordinating the Cerebrum and Cerebellum for a Dual-Brain Computer Use Agent with Decoupled Reinforcement Learning
- Authors: Zeyi Sun, Yuhang Cao, Jianze Liang, Qiushi Sun, Ziyu Liu, Zhixiong Zhang, Yuhang Zang, Xiaoyi Dong, Kai Chen, Dahua Lin, Jiaqi Wang,
- Abstract summary: Existing approaches suffer from a trade-off: generalist agents excel at planning but perform poorly in execution, while specialized agents demonstrate the opposite weakness.<n>Recent compositional frameworks attempt to bridge this gap by combining a planner and an actor, but they are typically static and non-trainable.<n>We introduce CODA, a novel and trainable compositional framework that integrates a generalist planner with a specialist executor.
- Score: 81.08755597239262
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Autonomous agents for Graphical User Interfaces (GUIs) face significant challenges in specialized domains such as scientific computing, where both long-horizon planning and precise execution are required. Existing approaches suffer from a trade-off: generalist agents excel at planning but perform poorly in execution, while specialized agents demonstrate the opposite weakness. Recent compositional frameworks attempt to bridge this gap by combining a planner and an actor, but they are typically static and non-trainable, which prevents adaptation from experience. This is a critical limitation given the scarcity of high-quality data in scientific domains. To address these limitations, we introduce CODA, a novel and trainable compositional framework that integrates a generalist planner (Cerebrum) with a specialist executor (Cerebellum), trained via a dedicated two-stage pipeline. In the first stage, Specialization, we apply a decoupled GRPO approach to train an expert planner for each scientific application individually, bootstrapping from a small set of task trajectories. In the second stage, Generalization, we aggregate all successful trajectories from the specialized experts to build a consolidated dataset, which is then used for supervised fine-tuning of the final planner. This equips CODA with both robust execution and cross-domain generalization. Evaluated on four challenging applications from the ScienceBoard benchmark, CODA significantly outperforms baselines and establishes a new state of the art among open-source models.
Related papers
- K^2-Agent: Co-Evolving Know-What and Know-How for Hierarchical Mobile Device Control [73.50217471850658]
K2-Agent is a hierarchical framework that models human-like cognition by knowing and co-evolving declarative (what) and procedural (how) knowledge for planning and execution.<n>On the challenging AndroidWorld benchmark, K2-Agent achieves a 76.1% success rate using only raw and open-source backbones.
arXiv Detail & Related papers (2026-02-28T14:33:14Z) - MagicAgent: Towards Generalized Agent Planning [73.21129030631421]
We present textbfMagicAgent, a series of foundation models specifically designed for generalized agent planning.<n>We introduce a lightweight and scalable synthetic data framework that generates high-quality trajectories across diverse planning tasks.<n>We show that MagicAgent-32B and MagicAgent-30B-A3B achieve superior performance across diverse open-source benchmarks.
arXiv Detail & Related papers (2026-02-22T01:39:16Z) - OFA-MAS: One-for-All Multi-Agent System Topology Design based on Mixture-of-Experts Graph Generative Models [57.94189874119267]
Multi-Agent Systems (MAS) offer a powerful paradigm for solving complex problems.<n>Current graph learning-based design methodologies often adhere to a "one-for-one" paradigm.<n>We propose OFA-TAD, a one-for-all framework that generates adaptive collaboration graphs for any task described in natural language.
arXiv Detail & Related papers (2026-01-19T12:23:44Z) - Token-Level LLM Collaboration via FusionRoute [60.72307345997823]
FusionRoute is a token-level multi-LLM collaboration framework.<n>It selects the most suitable expert at each decoding step and contributes a complementary logit that refines or corrects the selected expert's next-token distribution.<n>It outperforms both sequence- and token-level collaboration, model merging, and direct fine-tuning.
arXiv Detail & Related papers (2026-01-08T16:53:16Z) - GMoPE:A Prompt-Expert Mixture Framework for Graph Foundation Models [30.023472202549076]
Graph Neural Networks (GNNs) have demonstrated impressive performance on task-specific benchmarks, yet their ability to generalize across diverse domains and tasks remains limited.<n>We propose GMoPE, a framework that seamlessly integrates the Mixture-of-Experts (MoE) architecture with prompt-based learning for graphs.<n>We show that GMoPE consistently outperforms state-of-the-art baselines and achieves performance comparable to full parameter fine-tuning.
arXiv Detail & Related papers (2025-11-05T07:28:51Z) - DUET: Dual Model Co-Training for Entire Space CTR Prediction [34.35929309131385]
textbfDUET (textbfDUal Model Co-Training for textbfDUal Model Co-Training for textbfEntire Space CtextbfTR Prediction) is a set-wise pre-ranking framework that achieves expressive modeling under tight computational budgets.<n>It consistently outperforms state-of-the-art baselines and achieves improvements across multiple core business metrics.
arXiv Detail & Related papers (2025-10-28T12:46:33Z) - Generalist++: A Meta-learning Framework for Mitigating Trade-off in Adversarial Training [105.74524789405514]
adversarial training (AT) is currently the most effective defense against neural networks.<n>We propose to partition the overall generalization goal into multiple sub-tasks, each assigned to a dedicated base learner.<n>In the later stages of training, we interpolate their parameters to form a knowledgeable global learner.<n>We term this framework Generalist and introduce three variants tailored to different application scenarios.
arXiv Detail & Related papers (2025-10-15T09:47:54Z) - COME: Dual Structure-Semantic Learning with Collaborative MoE for Universal Lesion Detection Across Heterogeneous Ultrasound Datasets [25.82307075214309]
We propose a Universal Collaborative Mixture of Heterogeneous Source-Specific Experts (COME)<n>COME establishes dual structure-semantic shared experts that create a universal representation space and then collaborate with source-specific experts to extract discriminative features.<n>This design enables robust generalization by leveraging cross-datasets experience distributions and providing universal US priors for small-batch or unseen data scenarios.
arXiv Detail & Related papers (2025-08-13T15:43:20Z) - Predicting Multi-Agent Specialization via Task Parallelizability [8.465921582175426]
We present a closed-form bound that predicts when specialization improves performance depending on task regimes and team size.<n>We validate our model on two standard MARL benchmarks that represent opposite benchmarks.<n>Three follow-up experiments in Overcooked-AI demonstrate that the model works in environments with more complex spatial and resource bottlenecks.
arXiv Detail & Related papers (2025-03-19T21:33:48Z) - HG-Adapter: Improving Pre-Trained Heterogeneous Graph Neural Networks with Dual Adapters [53.97380482341493]
"pre-train, prompt-tuning" has demonstrated impressive performance for tuning pre-trained heterogeneous graph neural networks (HGNNs)
We propose a unified framework that combines two new adapters with potential labeled data extension to improve the generalization of pre-trained HGNN models.
arXiv Detail & Related papers (2024-11-02T06:43:54Z) - Parallel Strategies for Best-First Generalized Planning [51.713634067802104]
Generalized planning (GP) is a research area of AI that studies the automated synthesis of algorithmic-like solutions capable of solving multiple classical planning instances.
One of the current advancements has been the introduction of Best-First Generalized Planning (BFGP), a GP algorithm based on a novel solution space that can be explored with search.
This paper evaluates the application of parallel search techniques to BFGP, another critical component in closing the performance gap.
arXiv Detail & Related papers (2024-07-31T09:50:22Z) - The Overcooked Generalisation Challenge [8.131038178603873]
We introduce the Overcooked Generalisation Challenge (OGC)<n>It is the first benchmark to study agents' zero-shot cooperation abilities when faced with novel partners and levels in the Overcooked-AI environment.<n>Our challenge interfaces with state-of-the-art dual curriculum design (DCD) methods to generate auto-curricula for training general agents in Overcooked.
arXiv Detail & Related papers (2024-06-25T21:51:43Z) - Mastering Text, Code and Math Simultaneously via Fusing Highly Specialized Language Models [93.92762966380793]
Large language models (LLMs) strive to achieve high performance across all three domains simultaneously.
In this paper, we propose to fuse models that are already highly-specialized directly.
The proposed fusing framework, UltraFuser, consists of three distinct specialists that are already sufficiently trained on language, coding, and mathematics.
arXiv Detail & Related papers (2024-03-13T06:18:48Z) - Enhancing Compositional Generalization via Compositional Feature Alignment [14.289836081158615]
We develop CG-Bench, a suite of CG benchmarks derived from existing real-world image datasets.
We propose Compositional Feature Alignment (CFA), a simple two-stage finetuning technique.
We conduct experiments on CG-Bench for CLIP and DINOv2, two powerful pretrained vision foundation models.
arXiv Detail & Related papers (2024-02-05T10:06:24Z) - Personalizing Federated Learning with Over-the-Air Computations [84.8089761800994]
Federated edge learning is a promising technology to deploy intelligence at the edge of wireless networks in a privacy-preserving manner.
Under such a setting, multiple clients collaboratively train a global generic model under the coordination of an edge server.
This paper presents a distributed training paradigm that employs analog over-the-air computation to address the communication bottleneck.
arXiv Detail & Related papers (2023-02-24T08:41:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.