GitAgent: Facilitating Autonomous Agent with GitHub by Tool Extension
- URL: http://arxiv.org/abs/2312.17294v1
- Date: Thu, 28 Dec 2023 15:47:30 GMT
- Title: GitAgent: Facilitating Autonomous Agent with GitHub by Tool Extension
- Authors: Bohan Lyu, Xin Cong, Heyang Yu, Pan Yang, Yujia Qin, Yining Ye, Yaxi
Lu, Zhong Zhang, Yukun Yan, Yankai Lin, Zhiyuan Liu, Maosong Sun
- Abstract summary: A growing area of research focuses on Large Language Models (LLMs) equipped with external tools capable of performing diverse tasks.
In this paper, we introduce GitAgent, an agent capable of achieving the autonomous tool extension from GitHub.
- Score: 81.44231422624055
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While Large Language Models (LLMs) like ChatGPT and GPT-4 have demonstrated
exceptional proficiency in natural language processing, their efficacy in
addressing complex, multifaceted tasks remains limited. A growing area of
research focuses on LLM-based agents equipped with external tools capable of
performing diverse tasks. However, existing LLM-based agents only support a
limited set of tools which is unable to cover a diverse range of user queries,
especially for those involving expertise domains. It remains a challenge for
LLM-based agents to extend their tools autonomously when confronted with
various user queries. As GitHub has hosted a multitude of repositories which
can be seen as a good resource for tools, a promising solution is that
LLM-based agents can autonomously integrate the repositories in GitHub
according to the user queries to extend their tool set. In this paper, we
introduce GitAgent, an agent capable of achieving the autonomous tool extension
from GitHub. GitAgent follows a four-phase procedure to incorporate
repositories and it can learn human experience by resorting to GitHub
Issues/PRs to solve problems encountered during the procedure. Experimental
evaluation involving 30 user queries demonstrates GitAgent's effectiveness,
achieving a 69.4% success rate on average.
Related papers
- O$^2$-Searcher: A Searching-based Agent Model for Open-Domain Open-Ended Question Answering [31.38063794496179]
O$2$-Searcher is a novel search agent leveraging reinforcement learning to tackle both open-ended and closed-ended questions in the open domain.<n>It employs a unified training mechanism with meticulously designed reward functions, enabling the agent to identify problem types and adapt different answer generation strategies.<n>Extensive experiments show that O$2$-Searcher, using only a 3B model, significantly surpasses leading LLM agents on O$2$-QA.
arXiv Detail & Related papers (2025-05-22T12:17:13Z) - Top General Performance = Top Domain Performance? DomainCodeBench: A Multi-domain Code Generation Benchmark [38.14474956762422]
We introduce DomainCodeBench, a benchmark designed to evaluate large language models (LLMs) across 12 software application domains and 15 programming languages.<n>We find that top general-domain models do not consistently excel in specific application domains.<n>We show that augmenting prompts with domain-specific knowledge improves performance by around 38.17%.
arXiv Detail & Related papers (2024-12-24T17:56:08Z) - ReDel: A Toolkit for LLM-Powered Recursive Multi-Agent Systems [39.85101344037394]
ReDel is a toolkit for building multi-agent systems.
It supports custom tool-use, delegation schemes, event-based logging, and interactive replay.
Our code, documentation, and PyPI package are open-source and free to use under the MIT license.
arXiv Detail & Related papers (2024-08-05T05:43:23Z) - AgentGym: Evolving Large Language Model-based Agents across Diverse Environments [116.97648507802926]
Large language models (LLMs) are considered a promising foundation to build such agents.
We take the first step towards building generally-capable LLM-based agents with self-evolution ability.
We propose AgentGym, a new framework featuring a variety of environments and tasks for broad, real-time, uni-format, and concurrent agent exploration.
arXiv Detail & Related papers (2024-06-06T15:15:41Z) - Alibaba LingmaAgent: Improving Automated Issue Resolution via Comprehensive Repository Exploration [64.19431011897515]
This paper presents Alibaba LingmaAgent, a novel Automated Software Engineering method designed to comprehensively understand and utilize whole software repositories for issue resolution.
Our approach introduces a top-down method to condense critical repository information into a knowledge graph, reducing complexity, and employs a Monte Carlo tree search based strategy.
In production deployment and evaluation at Alibaba Cloud, LingmaAgent automatically resolved 16.9% of in-house issues faced by development engineers, and solved 43.3% of problems after manual intervention.
arXiv Detail & Related papers (2024-06-03T15:20:06Z) - Tool Learning in the Wild: Empowering Language Models as Automatic Tool Agents [56.822238860147024]
Augmenting large language models with external tools has emerged as a promising approach to extend their utility.<n>Previous methods manually parse tool documentation and create in-context demonstrations, transforming tools into structured formats for LLMs to use in their step-by-step reasoning.<n>We propose AutoTools, a framework that enables LLMs to automate the tool-use workflow.
arXiv Detail & Related papers (2024-05-26T11:40:58Z) - Do We Really Need a Complex Agent System? Distill Embodied Agent into a Single Model [15.558269067931374]
We propose STEVE-2, a hierarchical knowledge distillation framework for open-ended embodied tasks.
After distillation, embodied agents can complete complex, open-ended tasks without additional expert guidance.
arXiv Detail & Related papers (2024-04-06T12:51:00Z) - MAGIS: LLM-Based Multi-Agent Framework for GitHub Issue Resolution [47.850418420195304]
Large Language Models (LLMs) have shown promise in code generation but face difficulties in resolving GitHub issues.
We propose a novel Multi-Agent framework for GitHub Issue reSolution, MAGIS, consisting of four agents customized for software evolution.
arXiv Detail & Related papers (2024-03-26T17:57:57Z) - Large Language Model based Multi-Agents: A Survey of Progress and Challenges [44.92286030322281]
Large Language Models (LLMs) have achieved remarkable success across a wide array of tasks.
Recently, based on the development of using one LLM as a single planning or decision-making agent, LLM-based multi-agent systems have achieved considerable progress in complex problem-solving and world simulation.
arXiv Detail & Related papers (2024-01-21T23:36:14Z) - CodeAgent: Enhancing Code Generation with Tool-Integrated Agent Systems for Real-World Repo-level Coding Challenges [41.038584732889895]
Large Language Models (LLMs) have shown promise in automated code generation but typically excel only in simpler tasks.
Our research pivots towards evaluating LLMs in a more realistic setting -- real-world repo-level code generation.
We present CodeAgent, a novel LLM-based agent framework that employs external tools for effective repo-level code generation.
arXiv Detail & Related papers (2024-01-14T18:12:03Z) - EASYTOOL: Enhancing LLM-based Agents with Concise Tool Instruction [56.02100384015907]
EasyTool is a framework transforming diverse and lengthy tool documentation into a unified and concise tool instruction.
It can significantly reduce token consumption and improve the performance of tool utilization in real-world scenarios.
arXiv Detail & Related papers (2024-01-11T15:45:11Z) - ML-Bench: Evaluating Large Language Models and Agents for Machine Learning Tasks on Repository-Level Code [76.84199699772903]
ML-Bench is a benchmark rooted in real-world programming applications that leverage existing code repositories to perform tasks.
To evaluate both Large Language Models (LLMs) and AI agents, two setups are employed: ML-LLM-Bench for assessing LLMs' text-to-code conversion within a predefined deployment environment, and ML-Agent-Bench for testing autonomous agents in an end-to-end task execution within a Linux sandbox environment.
arXiv Detail & Related papers (2023-11-16T12:03:21Z) - Knowledge Plugins: Enhancing Large Language Models for Domain-Specific
Recommendations [50.81844184210381]
We propose a general paradigm that augments large language models with DOmain-specific KnowledgE to enhance their performance on practical applications, namely DOKE.
This paradigm relies on a domain knowledge extractor, working in three steps: 1) preparing effective knowledge for the task; 2) selecting the knowledge for each specific sample; and 3) expressing the knowledge in an LLM-understandable way.
arXiv Detail & Related papers (2023-11-16T07:09:38Z) - Agents: An Open-source Framework for Autonomous Language Agents [98.91085725608917]
We consider language agents as a promising direction towards artificial general intelligence.
We release Agents, an open-source library with the goal of opening up these advances to a wider non-specialist audience.
arXiv Detail & Related papers (2023-09-14T17:18:25Z) - Recommender AI Agent: Integrating Large Language Models for Interactive
Recommendations [53.76682562935373]
We introduce an efficient framework called textbfInteRecAgent, which employs LLMs as the brain and recommender models as tools.
InteRecAgent achieves satisfying performance as a conversational recommender system, outperforming general-purpose LLMs.
arXiv Detail & Related papers (2023-08-31T07:36:44Z) - AgentBench: Evaluating LLMs as Agents [88.45506148281379]
Large Language Models (LLMs) are becoming increasingly smart and autonomous, targeting real-world pragmatic missions beyond traditional NLP tasks.
We present AgentBench, a benchmark that currently consists of 8 distinct environments to assess LLM-as-Agent's reasoning and decision-making abilities.
arXiv Detail & Related papers (2023-08-07T16:08:11Z) - OpenAGI: When LLM Meets Domain Experts [51.86179657467822]
Human Intelligence (HI) excels at combining basic skills to solve complex tasks.
This capability is vital for Artificial Intelligence (AI) and should be embedded in comprehensive AI Agents.
We introduce OpenAGI, an open-source platform designed for solving multi-step, real-world tasks.
arXiv Detail & Related papers (2023-04-10T03:55:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.