OpenHands: An Open Platform for AI Software Developers as Generalist Agents
- URL: http://arxiv.org/abs/2407.16741v2
- Date: Fri, 4 Oct 2024 14:54:08 GMT
- Title: OpenHands: An Open Platform for AI Software Developers as Generalist Agents
- Authors: Xingyao Wang, Boxuan Li, Yufan Song, Frank F. Xu, Xiangru Tang, Mingchen Zhuge, Jiayi Pan, Yueqi Song, Bowen Li, Jaskirat Singh, Hoang H. Tran, Fuqiang Li, Ren Ma, Mingzhang Zheng, Bill Qian, Yanjun Shao, Niklas Muennighoff, Yizhe Zhang, Binyuan Hui, Junyang Lin, Robert Brennan, Hao Peng, Heng Ji, Graham Neubig,
- Abstract summary: We introduce OpenHands, a platform for the development of AI agents that interact with the world in similar ways to a human developer.
We describe how the platform allows for the implementation of new agents, safe interaction with sandboxed environments for code execution, and incorporation of evaluation benchmarks.
- Score: 109.8507367518992
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Software is one of the most powerful tools that we humans have at our disposal; it allows a skilled programmer to interact with the world in complex and profound ways. At the same time, thanks to improvements in large language models (LLMs), there has also been a rapid development in AI agents that interact with and affect change in their surrounding environments. In this paper, we introduce OpenHands (f.k.a. OpenDevin), a platform for the development of powerful and flexible AI agents that interact with the world in similar ways to those of a human developer: by writing code, interacting with a command line, and browsing the web. We describe how the platform allows for the implementation of new agents, safe interaction with sandboxed environments for code execution, coordination between multiple agents, and incorporation of evaluation benchmarks. Based on our currently incorporated benchmarks, we perform an evaluation of agents over 15 challenging tasks, including software engineering (e.g., SWE-BENCH) and web browsing (e.g., WEBARENA), among others. Released under the permissive MIT license, OpenHands is a community project spanning academia and industry with more than 2.1K contributions from over 188 contributors.
Related papers
- Improving Performance of Commercially Available AI Products in a Multi-Agent Configuration [11.626057561212694]
Crowdbotics PRD AI is a tool for generating software requirements using AI.
GitHub Copilot is an AI pair-programming tool.
By sharing business requirements from PRD AI, we improve the code suggestion capabilities of GitHub Copilot by 13.8% and developer task success rate by 24.5%.
arXiv Detail & Related papers (2024-10-29T15:28:19Z) - Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence [79.5316642687565]
Existing multi-agent frameworks often struggle with integrating diverse capable third-party agents.
We propose the Internet of Agents (IoA), a novel framework that addresses these limitations.
IoA introduces an agent integration protocol, an instant-messaging-like architecture design, and dynamic mechanisms for agent teaming and conversation flow control.
arXiv Detail & Related papers (2024-07-09T17:33:24Z) - CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents [49.68117560675367]
Crab is the first benchmark framework designed to support cross-environment tasks.
Our framework supports multiple devices and can be easily extended to any environment with a Python interface.
The experimental results demonstrate that the single agent with GPT-4o achieves the best completion ratio of 38.01%.
arXiv Detail & Related papers (2024-07-01T17:55:04Z) - AgileCoder: Dynamic Collaborative Agents for Software Development based on Agile Methodology [5.164094478488741]
AgileCoder is a multi agent system that integrates Agile Methodology (AM) into the framework.
This system assigns specific AM roles - such as Product Manager, Developer, and Tester to different agents, who then collaboratively develop software based on user inputs.
arXiv Detail & Related papers (2024-06-16T17:57:48Z) - AutoDev: Automated AI-Driven Development [9.586330606828643]
AutoDev is a fully automated AI-driven software development framework.
It enables users to define complex software engineering objectives, which are assigned to AutoDev's autonomous AI Agents.
AutoDev establishes a secure development environment by confining all operations within Docker containers.
arXiv Detail & Related papers (2024-03-13T07:12:03Z) - AgentScope: A Flexible yet Robust Multi-Agent Platform [66.64116117163755]
AgentScope is a developer-centric multi-agent platform with message exchange as its core communication mechanism.
The abundant syntactic tools, built-in agents and service functions, user-friendly interfaces for application demonstration and utility monitor, zero-code programming workstation, and automatic prompt tuning mechanism significantly lower the barriers to both development and deployment.
arXiv Detail & Related papers (2024-02-21T04:11:28Z) - OpenAgents: An Open Platform for Language Agents in the Wild [71.16800991568677]
We present OpenAgents, an open platform for using and hosting language agents in the wild of everyday life.
We elucidate the challenges and opportunities, aspiring to set a foundation for future research and development of real-world language agents.
arXiv Detail & Related papers (2023-10-16T17:54:53Z) - Agents: An Open-source Framework for Autonomous Language Agents [98.91085725608917]
We consider language agents as a promising direction towards artificial general intelligence.
We release Agents, an open-source library with the goal of opening up these advances to a wider non-specialist audience.
arXiv Detail & Related papers (2023-09-14T17:18:25Z) - Polycraft World AI Lab (PAL): An Extensible Platform for Evaluating
Artificial Intelligence Agents [0.0]
We present the Polycraft World AI Lab (PAL), a task simulator with an API based on the Minecraft mod Polycraft World.
PAL enables the creation of tasks in a flexible manner as well as having the capability to manipulate any aspect of the task during an evaluation.
In summary, we report a versatile and AI evaluation platform with a low barrier to entry for AI researchers to utilize.
arXiv Detail & Related papers (2023-01-27T18:08:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.