Related papers: LiteCUA: Computer as MCP Server for Computer-Use Agent on AIOS

LiteCUA: Computer as MCP Server for Computer-Use Agent on AIOS

URL: http://arxiv.org/abs/2505.18829v1
Date: Sat, 24 May 2025 18:56:00 GMT
Title: LiteCUA: Computer as MCP Server for Computer-Use Agent on AIOS
Authors: Kai Mei, Xi Zhu, Hang Gao, Shuhang Lin, Yongfeng Zhang,
Abstract summary: AIOS 1.0 is a novel platform designed to advance computer-use agent capabilities through environmental contextualization.<n>We introduce LiteCUA, a lightweight computer-use agent built on AIOS 1.0 that achieves a 14.66% success rate on the OSWorld benchmark.<n>Our results suggest that contextualizing computer environments for language models represents a promising direction for developing more capable computer-use agents.
Score: 37.35501264841289
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present AIOS 1.0, a novel platform designed to advance computer-use agent (CUA) capabilities through environmental contextualization. While existing approaches primarily focus on building more powerful agent frameworks or enhancing agent models, we identify a fundamental limitation: the semantic disconnect between how language models understand the world and how computer interfaces are structured. AIOS 1.0 addresses this challenge by transforming computers into contextual environments that language models can natively comprehend, implementing a Model Context Protocol (MCP) server architecture to abstract computer states and actions. This approach effectively decouples interface complexity from decision complexity, enabling agents to reason more effectively about computing environments. To demonstrate our platform's effectiveness, we introduce LiteCUA, a lightweight computer-use agent built on AIOS 1.0 that achieves a 14.66% success rate on the OSWorld benchmark, outperforming several specialized agent frameworks despite its simple architecture. Our results suggest that contextualizing computer environments for language models represents a promising direction for developing more capable computer-use agents and advancing toward AI that can interact with digital systems. The source code of LiteCUA is available at https://github.com/agiresearch/LiteCUA, and it is also integrated into the AIOS main branch as part of AIOS at https://github.com/agiresearch/AIOS.

Related papers

OS Agents: A Survey on MLLM-based Agents for General Computing Devices Use [101.57043903478257]
The dream to create AI assistants as capable and versatile as the fictional J.A.R.V.I.S from Iron Man has long captivated imaginations.<n>With the evolution of (multi-modal) large language models ((M)LLMs), this dream is closer to reality.<n>This survey aims to consolidate the state of OS Agents research, providing insights to guide both academic inquiry and industrial development.
arXiv Detail & Related papers (2025-08-06T14:33:45Z)
InfantAgent-Next: A Multimodal Generalist Agent for Automated Computer Interaction [35.285466934451904]
This paper introduces textscInfantAgent-Next, a generalist agent capable of interacting with computers in a multimodal manner.<n>Unlike existing approaches that either build intricate around a single large model or only provide modularity, our agent integrates tool-based and pure vision agents.
arXiv Detail & Related papers (2025-05-16T05:43:27Z)
UFO2: The Desktop AgentOS [60.317812905300336]
UFO2 is a multiagent AgentOS for Windows desktops that elevates into practical, system-level automation.<n>We evaluate UFO2 across over 20 real-world Windows applications, demonstrating substantial improvements in robustness and execution accuracy over prior CUAs.<n>Our results show that deep OS integration unlocks a scalable path toward reliable, user-aligned desktop automation.
arXiv Detail & Related papers (2025-04-20T13:04:43Z)
IntellAgent: A Multi-Agent Framework for Evaluating Conversational AI Systems [2.2810745411557316]
We introduce IntellAgent, a scalable, open-source framework to evaluate conversational AI systems.<n>IntellAgent automates the creation of synthetic benchmarks by combining policy-driven graph modeling, realistic event generation, and interactive user-agent simulations.<n>Our findings demonstrate that IntellAgent serves as an effective framework for advancing conversational AI by addressing challenges in bridging research and deployment.
arXiv Detail & Related papers (2025-01-19T14:58:35Z)
Contextual Augmented Multi-Model Programming (CAMP): A Hybrid Local-Cloud Copilot Framework [8.28588489551341]
This paper presents CAMP, a multi-model AI-assisted programming framework that consists of a local model that employs Retrieval-Augmented Generation (RAG)<n>RAG retrieves contextual information from the cloud model to facilitate context-aware prompt construction.<n>The methodology is actualized in Copilot for Xcode, an AI-assisted programming tool crafted for the Apple software ecosystem.
arXiv Detail & Related papers (2024-10-20T04:51:24Z)
SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering [79.07755560048388]
SWE-agent is a system that facilitates LM agents to autonomously use computers to solve software engineering tasks. SWE-agent's custom agent-computer interface (ACI) significantly enhances an agent's ability to create and edit code files, navigate entire repositories, and execute tests and other programs. We evaluate SWE-agent on SWE-bench and HumanEvalFix, achieving state-of-the-art performance on both with a pass@1 rate of 12.5% and 87.7%, respectively.
arXiv Detail & Related papers (2024-05-06T17:41:33Z)
OS-Copilot: Towards Generalist Computer Agents with Self-Improvement [48.29860831901484]
We introduce OS-Copilot, a framework to build generalist agents capable of interfacing with comprehensive elements in an operating system (OS) We use OS-Copilot to create FRIDAY, a self-improving embodied agent for automating general computer tasks. On GAIA, a general AI assistants benchmark, FRIDAY outperforms previous methods by 35%, showcasing strong generalization to unseen applications via accumulated skills from previous tasks.
arXiv Detail & Related papers (2024-02-12T07:29:22Z)
LLM as OS, Agents as Apps: Envisioning AIOS, Agents and the AIOS-Agent Ecosystem [48.81136793994758]
Large Language Model (LLM) serves as the (Artificial) Intelligent Operating System (IOS), or AIOS--an operating system "with soul" We envision that LLM's impact will not be limited to the AI application level, instead, it will in turn revolutionize the design and implementation of computer system, architecture, software, and programming language.
arXiv Detail & Related papers (2023-12-06T18:50:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.