IntentCUA: Learning Intent-level Representations for Skill Abstraction and Multi-Agent Planning in Computer-Use Agents
- URL: http://arxiv.org/abs/2602.17049v1
- Date: Thu, 19 Feb 2026 03:42:15 GMT
- Title: IntentCUA: Learning Intent-level Representations for Skill Abstraction and Multi-Agent Planning in Computer-Use Agents
- Authors: Seoyoung Lee, Seobin Yoon, Seongbeen Lee, Yoojung Chun, Dayoung Park, Doyeon Kim, Joo Yong Sim,
- Abstract summary: We present IntentCUA, a computer-use framework designed to stabilize long-horizon execution through intent-aligned plan memory.<n>Int Intent prototypes retrieve subgroup-aligned skills and inject them into partial plans, reducing redundant re-planning.<n>Int IntentCUA achieved a 74.83% task success rate with a Step Efficiency Ratio of 0.91, outperforming RL-based and trajectory-centric baselines.
- Score: 4.655926959889001
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Computer-use agents operate over long horizons under noisy perception, multi-window contexts, evolving environment states. Existing approaches, from RL-based planners to trajectory retrieval, often drift from user intent and repeatedly solve routine subproblems, leading to error accumulation and inefficiency. We present IntentCUA, a multi-agent computer-use framework designed to stabilize long-horizon execution through intent-aligned plan memory. A Planner, Plan-Optimizer, and Critic coordinate over shared memory that abstracts raw interaction traces into multi-view intent representations and reusable skills. At runtime, intent prototypes retrieve subgroup-aligned skills and inject them into partial plans, reducing redundant re-planning and mitigating error propagation across desktop applications. In end-to-end evaluations, IntentCUA achieved a 74.83% task success rate with a Step Efficiency Ratio of 0.91, outperforming RL-based and trajectory-centric baselines. Ablations show that multi-view intent abstraction and shared plan memory jointly improve execution stability, with the cooperative multi-agent loop providing the largest gains on long-horizon tasks. These results highlight that system-level intent abstraction and memory-grounded coordination are key to reliable and efficient desktop automation in large, dynamic environments.
Related papers
- MagicAgent: Towards Generalized Agent Planning [73.21129030631421]
We present textbfMagicAgent, a series of foundation models specifically designed for generalized agent planning.<n>We introduce a lightweight and scalable synthetic data framework that generates high-quality trajectories across diverse planning tasks.<n>We show that MagicAgent-32B and MagicAgent-30B-A3B achieve superior performance across diverse open-source benchmarks.
arXiv Detail & Related papers (2026-02-22T01:39:16Z) - HiPER: Hierarchical Reinforcement Learning with Explicit Credit Assignment for Large Language Model Agents [36.77027704958893]
HiPER is a novel Hierarchical Plan-Execute RL framework that separates high-level planning from low-level execution.<n>HiPER achieves state-of-the-art performance on challenging interactive benchmarks, reaching 97.4% success on ALFWorld and 83.3% on WebShop with Qwen2.5-7B-Instruct.
arXiv Detail & Related papers (2026-02-18T03:31:34Z) - Learning to Share: Selective Memory for Efficient Parallel Agentic Systems [49.78267008828593]
Agentic systems solve complex tasks by coordinating multiple agents that iteratively reason, invoke tools, and exchange intermediate results.<n>Recent approaches deploy multiple agent teams running in parallel to explore diverse reasoning trajectories.<n>We propose Learning to Share (LTS), a learned shared-memory mechanism for parallel agentic frameworks.
arXiv Detail & Related papers (2026-02-05T18:20:21Z) - StackPlanner: A Centralized Hierarchical Multi-Agent System with Task-Experience Memory Management [25.50119360269554]
central agents often suffer from unstable long-horizon collaboration due to the lack of memory management.<n>We propose StackPlanner, a hierarchical multi-agent framework with explicit memory control.<n> Experiments on multiple deep-search and agent system benchmarks demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2026-01-09T16:09:48Z) - MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents [84.62985963113245]
We introduce MEM1, an end-to-end reinforcement learning framework that enables agents to operate with constant memory across long multi-turn tasks.<n>At each turn, MEM1 updates a compact shared internal state that jointly supports memory consolidation and reasoning.<n>We show that MEM1-7B improves performance by 3.5x while reducing memory usage by 3.7x compared to Qwen2.5-14B-Instruct on a 16-objective multi-hop QA task.
arXiv Detail & Related papers (2025-06-18T19:44:46Z) - Embodied Long Horizon Manipulation with Closed-loop Code Generation and Incremental Few-shot Adaptation [12.077740860502878]
Embodied long-horizon manipulation requires robotic systems to process multimodal inputs-such as vision and natural language-and translate them into executable actions.<n>Recent methods have explored using large language models (LLMs) as high-level planners that decompose tasks into subtasks using natural language and guide pretrained low-level controllers.<n>Our framework achieves state-of-the-art performance on 30+ diverse seen and unseen long-horizon tasks across LoHoRavens, CALVIN, Franka Kitchen, and cluttered real-world settings.
arXiv Detail & Related papers (2025-03-27T20:32:58Z) - Goal-Conditioned Reinforcement Learning with Disentanglement-based
Reachability Planning [14.370384505230597]
We propose a goal-conditioned RL algorithm combined with Disentanglement-based Reachability Planning (REPlan) to solve temporally extended tasks.
Our REPlan significantly outperforms the prior state-of-the-art methods in solving temporally extended tasks.
arXiv Detail & Related papers (2023-07-20T13:08:14Z) - Multi-task Over-the-Air Federated Learning: A Non-Orthogonal
Transmission Approach [52.85647632037537]
We propose a multi-task over-theair federated learning (MOAFL) framework, where multiple learning tasks share edge devices for data collection and learning models under the coordination of a edge server (ES)
Both the convergence analysis and numerical results demonstrate that the MOAFL framework can significantly reduce the uplink bandwidth consumption of multiple tasks without causing substantial learning performance degradation.
arXiv Detail & Related papers (2021-06-27T13:09:32Z) - Decoupled and Memory-Reinforced Networks: Towards Effective Feature
Learning for One-Step Person Search [65.51181219410763]
One-step methods have been developed to handle pedestrian detection and identification sub-tasks using a single network.
There are two major challenges in the current one-step approaches.
We propose a decoupled and memory-reinforced network (DMRNet) to overcome these problems.
arXiv Detail & Related papers (2021-02-22T06:19:45Z) - Dynamic Multi-Robot Task Allocation under Uncertainty and Temporal
Constraints [52.58352707495122]
We present a multi-robot allocation algorithm that decouples the key computational challenges of sequential decision-making under uncertainty and multi-agent coordination.
We validate our results over a wide range of simulations on two distinct domains: multi-arm conveyor belt pick-and-place and multi-drone delivery dispatch in a city.
arXiv Detail & Related papers (2020-05-27T01:10:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.