Related papers: OpenHands: An Open Platform for AI Software Developers as Generalist Agents

OpenHands: An Open Platform for AI Software Developers as Generalist Agents

URL: http://arxiv.org/abs/2407.16741v2
Date: Fri, 4 Oct 2024 14:54:08 GMT
Title: OpenHands: An Open Platform for AI Software Developers as Generalist Agents
Authors: Xingyao Wang, Boxuan Li, Yufan Song, Frank F. Xu, Xiangru Tang, Mingchen Zhuge, Jiayi Pan, Yueqi Song, Bowen Li, Jaskirat Singh, Hoang H. Tran, Fuqiang Li, Ren Ma, Mingzhang Zheng, Bill Qian, Yanjun Shao, Niklas Muennighoff, Yizhe Zhang, Binyuan Hui, Junyang Lin, Robert Brennan, Hao Peng, Heng Ji, Graham Neubig,
Abstract summary: We introduce OpenHands, a platform for the development of AI agents that interact with the world in similar ways to a human developer. We describe how the platform allows for the implementation of new agents, safe interaction with sandboxed environments for code execution, and incorporation of evaluation benchmarks.
Score: 109.8507367518992
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Software is one of the most powerful tools that we humans have at our disposal; it allows a skilled programmer to interact with the world in complex and profound ways. At the same time, thanks to improvements in large language models (LLMs), there has also been a rapid development in AI agents that interact with and affect change in their surrounding environments. In this paper, we introduce OpenHands (f.k.a. OpenDevin), a platform for the development of powerful and flexible AI agents that interact with the world in similar ways to those of a human developer: by writing code, interacting with a command line, and browsing the web. We describe how the platform allows for the implementation of new agents, safe interaction with sandboxed environments for code execution, coordination between multiple agents, and incorporation of evaluation benchmarks. Based on our currently incorporated benchmarks, we perform an evaluation of agents over 15 challenging tasks, including software engineering (e.g., SWE-BENCH) and web browsing (e.g., WEBARENA), among others. Released under the permissive MIT license, OpenHands is a community project spanning academia and industry with more than 2.1K contributions from over 188 contributors.

Related papers

The Rise of AI Teammates in Software Engineering (SE) 3.0: How Autonomous Coding Agents Are Reshaping Software Engineering [10.252332355171237]
This paper introduces AIDev, the first largescale dataset capturing how such agents operate in the wild.<n>Spanning over 456,000 pull requests by five leading agents, AIDev provides an unprecedented empirical foundation for studying autonomous teammates in software development.<n>The dataset includes rich on PRs, authorship, review timelines, code changes, and integration outcomes.
arXiv Detail & Related papers (2025-07-20T15:15:58Z)
Code with Me or for Me? How Increasing AI Automation Transforms Developer Workflows [66.1850490474361]
We conduct the first academic study to explore developer interactions with coding agents.<n>We evaluate two leading copilot and agentic coding assistants, GitHub Copilot and OpenHands.<n>Our results show agents have the potential to assist developers in ways that surpass copilots.
arXiv Detail & Related papers (2025-07-10T20:12:54Z)
Unified Software Engineering agent as AI Software Engineer [14.733475669942276]
Large Language Model (LLM) technology has raised expectations for automated coding.<n>In this paper, we seek to understand this question by developing a Unified Software Engineering agent or USEagent.<n>We envision USEagent as the first draft of a future AI Software Engineer which can be a team member in future software development teams involving both AI and humans.
arXiv Detail & Related papers (2025-06-17T16:19:13Z)
Cerebrum (AIOS SDK): A Platform for Agent Development, Deployment, Distribution, and Discovery [33.89476893368382]
We present Cerebrum, an Agent SDK for AIOS that addresses the gap through three key components. A comprehensive SDK featuring a modular four-layer architecture for agent development; (2) a community-driven Agent Hub for sharing and discovering agents; and (3) an interactive web interface for testing and evaluating agents. Cerebrum advances the field by providing a unified framework that standardizes agent development while maintaining flexibility for researchers and developers to innovate and distribute their agents.
arXiv Detail & Related papers (2025-03-14T14:29:17Z)
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks [52.46737975742287]
We build a self-contained environment with data that mimics a small software company environment. We find that with the most competitive agent, 24% of the tasks can be completed autonomously. This paints a nuanced picture on task automation with LM agents.
arXiv Detail & Related papers (2024-12-18T18:55:40Z)
Improving Performance of Commercially Available AI Products in a Multi-Agent Configuration [11.626057561212694]
Crowdbotics PRD AI is a tool for generating software requirements using AI. GitHub Copilot is an AI pair-programming tool. By sharing business requirements from PRD AI, we improve the code suggestion capabilities of GitHub Copilot by 13.8% and developer task success rate by 24.5%.
arXiv Detail & Related papers (2024-10-29T15:28:19Z)
Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence [79.5316642687565]
Existing multi-agent frameworks often struggle with integrating diverse capable third-party agents. We propose the Internet of Agents (IoA), a novel framework that addresses these limitations. IoA introduces an agent integration protocol, an instant-messaging-like architecture design, and dynamic mechanisms for agent teaming and conversation flow control.
arXiv Detail & Related papers (2024-07-09T17:33:24Z)
CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents [49.68117560675367]
Crab is the first benchmark framework designed to support cross-environment tasks. Our framework supports multiple devices and can be easily extended to any environment with a Python interface. The experimental results demonstrate that the single agent with GPT-4o achieves the best completion ratio of 38.01%.
arXiv Detail & Related papers (2024-07-01T17:55:04Z)
AgileCoder: Dynamic Collaborative Agents for Software Development based on Agile Methodology [5.164094478488741]
AgileCoder is a multi agent system that integrates Agile Methodology (AM) into the framework. This system assigns specific AM roles - such as Product Manager, Developer, and Tester to different agents, who then collaboratively develop software based on user inputs.
arXiv Detail & Related papers (2024-06-16T17:57:48Z)
AutoDev: Automated AI-Driven Development [9.586330606828643]
AutoDev is a fully automated AI-driven software development framework. It enables users to define complex software engineering objectives, which are assigned to AutoDev's autonomous AI Agents. AutoDev establishes a secure development environment by confining all operations within Docker containers.
arXiv Detail & Related papers (2024-03-13T07:12:03Z)
AgentScope: A Flexible yet Robust Multi-Agent Platform [66.64116117163755]
AgentScope is a developer-centric multi-agent platform with message exchange as its core communication mechanism. The abundant syntactic tools, built-in agents and service functions, user-friendly interfaces for application demonstration and utility monitor, zero-code programming workstation, and automatic prompt tuning mechanism significantly lower the barriers to both development and deployment.
arXiv Detail & Related papers (2024-02-21T04:11:28Z)
OpenAgents: An Open Platform for Language Agents in the Wild [71.16800991568677]
We present OpenAgents, an open platform for using and hosting language agents in the wild of everyday life. We elucidate the challenges and opportunities, aspiring to set a foundation for future research and development of real-world language agents.
arXiv Detail & Related papers (2023-10-16T17:54:53Z)
Agents: An Open-source Framework for Autonomous Language Agents [98.91085725608917]
We consider language agents as a promising direction towards artificial general intelligence. We release Agents, an open-source library with the goal of opening up these advances to a wider non-specialist audience.
arXiv Detail & Related papers (2023-09-14T17:18:25Z)
Polycraft World AI Lab (PAL): An Extensible Platform for Evaluating Artificial Intelligence Agents [0.0]
We present the Polycraft World AI Lab (PAL), a task simulator with an API based on the Minecraft mod Polycraft World. PAL enables the creation of tasks in a flexible manner as well as having the capability to manipulate any aspect of the task during an evaluation. In summary, we report a versatile and AI evaluation platform with a low barrier to entry for AI researchers to utilize.
arXiv Detail & Related papers (2023-01-27T18:08:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.