Towards Enterprise-Ready Computer Using Generalist Agent
- URL: http://arxiv.org/abs/2503.01861v1
- Date: Mon, 24 Feb 2025 09:31:56 GMT
- Title: Towards Enterprise-Ready Computer Using Generalist Agent
- Authors: Sami Marreed, Alon Oved, Avi Yaeli, Segev Shlomov, Ido Levy, Aviad Sela, Asaf Adi, Nir Mashkif,
- Abstract summary: This paper presents our ongoing work toward developing an enterprise-ready Computer Using Generalist Agent (CUGA) system.<n>By integrating state-of-the-art agentic AI techniques with a systematic approach to iterative evaluation, analysis, and refinement, we have achieved rapid and cost-effective performance gains.
- Score: 2.8457587793623875
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents our ongoing work toward developing an enterprise-ready Computer Using Generalist Agent (CUGA) system. Our research highlights the evolutionary nature of building agentic systems suitable for enterprise environments. By integrating state-of-the-art agentic AI techniques with a systematic approach to iterative evaluation, analysis, and refinement, we have achieved rapid and cost-effective performance gains, notably reaching a new state-of-the-art performance on the WebArena benchmark. We detail our development roadmap, the methodology and tools that facilitated rapid learning from failures and continuous system refinement, and discuss key lessons learned and future challenges for enterprise adoption.
Related papers
- A Survey on (M)LLM-Based GUI Agents [62.57899977018417]
Graphical User Interface (GUI) Agents have emerged as a transformative paradigm in human-computer interaction.
Recent advances in large language models and multimodal learning have revolutionized GUI automation across desktop, mobile, and web platforms.
This survey identifies key technical challenges, including accurate element localization, effective knowledge retrieval, long-horizon planning, and safety-aware execution control.
arXiv Detail & Related papers (2025-03-27T17:58:31Z) - Large Language Model Agent: A Survey on Methodology, Applications and Challenges [88.3032929492409]
Large Language Model (LLM) agents, with goal-driven behaviors and dynamic adaptation capabilities, potentially represent a critical pathway toward artificial general intelligence.
This survey systematically deconstructs LLM agent systems through a methodology-centered taxonomy.
Our work provides a unified architectural perspective, examining how agents are constructed, how they collaborate, and how they evolve over time.
arXiv Detail & Related papers (2025-03-27T12:50:17Z) - From Autonomous Agents to Integrated Systems, A New Paradigm: Orchestrated Distributed Intelligence [0.0]
We introduce the concept of Orchestrated Distributed Intelligence (ODI)
ODI reconceptualizes AI as cohesive, orchestrated networks that work in tandem with human expertise.
Our work outlines key theoretical implications and presents a practical roadmap for future research and enterprise innovation.
arXiv Detail & Related papers (2025-03-17T22:21:25Z) - AI Agents: Evolution, Architecture, and Real-World Applications [0.0]
The paper examines the evolution, architecture, and practical applications of AI agents from their early, rule-based incarnations to modern sophisticated systems that integrate large language models with dedicated modules for perception, planning, and tool use.
arXiv Detail & Related papers (2025-03-16T23:07:48Z) - AI Agentic workflows and Enterprise APIs: Adapting API architectures for the age of AI agents [0.0]
Generative AI has catalyzed the emergence of autonomous AI agents, presenting unprecedented challenges for enterprise computing infrastructures.
Current enterprise API architectures are predominantly designed for human-driven, predefined interaction patterns, rendering them ill-equipped to support intelligent agents' dynamic, goal-oriented behaviors.
This research systematically examines the architectural adaptations for enterprise APIs to support AI agentic effectively.
arXiv Detail & Related papers (2025-01-22T05:55:16Z) - Collaborative AI in Sentiment Analysis: System Architecture, Data Prediction and Deployment Strategies [3.3374611485861116]
Large language model (LLM) based artificial intelligence technologies have been a game-changer, particularly in sentiment analysis.
However, integrating diverse AI models for processing complex multimodal data and the associated high costs of feature extraction presents significant challenges.
This study introduces a collaborative AI framework designed to efficiently distribute and resolve tasks across various AI systems.
arXiv Detail & Related papers (2024-10-17T06:14:34Z) - Agent-as-a-Judge: Evaluate Agents with Agents [61.33974108405561]
We introduce the Agent-as-a-Judge framework, wherein agentic systems are used to evaluate agentic systems.
This is an organic extension of the LLM-as-a-Judge framework, incorporating agentic features that enable intermediate feedback for the entire task-solving process.
We present DevAI, a new benchmark of 55 realistic automated AI development tasks.
arXiv Detail & Related papers (2024-10-14T17:57:02Z) - Integrating Artificial Intelligence into Operating Systems: A Comprehensive Survey on Techniques, Applications, and Future Directions [16.28550500194823]
fusion of Artificial Intelligence with Operating Systems emerges as a critical frontier for innovation.<n>Current status of AI-OS integration, accentuating its pivotal role in steering the evolution of advanced computing paradigms.<n>Future prospects of Intelligent Operating Systems, debating how groundbreaking OS designs will usher in novel possibilities.
arXiv Detail & Related papers (2024-07-19T05:29:34Z) - WorkArena: How Capable Are Web Agents at Solving Common Knowledge Work Tasks? [83.19032025950986]
We study the use of large language model-based agents for interacting with software via web browsers.
WorkArena is a benchmark of 33 tasks based on the widely-used ServiceNow platform.
BrowserGym is an environment for the design and evaluation of such agents.
arXiv Detail & Related papers (2024-03-12T14:58:45Z) - Machine Learning Insides OptVerse AI Solver: Design Principles and
Applications [74.67495900436728]
We present a comprehensive study on the integration of machine learning (ML) techniques into Huawei Cloud's OptVerse AI solver.
We showcase our methods for generating complex SAT and MILP instances utilizing generative models that mirror multifaceted structures of real-world problem.
We detail the incorporation of state-of-the-art parameter tuning algorithms which markedly elevate solver performance.
arXiv Detail & Related papers (2024-01-11T15:02:15Z) - Proceedings of the Robust Artificial Intelligence System Assurance
(RAISA) Workshop 2022 [0.0]
The RAISA workshop will focus on research, development and application of robust artificial intelligence (AI) and machine learning (ML) systems.
Rather than studying robustness with respect to particular ML algorithms, our approach will be to explore robustness assurance at the system architecture level.
arXiv Detail & Related papers (2022-02-10T01:15:50Z) - Artificial Intelligence Technologies in Education: Benefits, Challenges
and Strategies of Implementation [8.54335661175611]
We have identified the benefits and challenges of implementing artificial intelligence in the education sector.
We have also reviewed modern AI technologies for learners and educators, currently available on the software market.
We have developed a strategy implementation model, described by a five-stage, generic process, along with the corresponding configuration guide.
arXiv Detail & Related papers (2021-02-11T11:09:41Z) - Technology Readiness Levels for Machine Learning Systems [107.56979560568232]
Development and deployment of machine learning systems can be executed easily with modern tools, but the process is typically rushed and means-to-an-end.
We have developed a proven systems engineering approach for machine learning development and deployment.
Our "Machine Learning Technology Readiness Levels" framework defines a principled process to ensure robust, reliable, and responsible systems.
arXiv Detail & Related papers (2021-01-11T15:54:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.