Related papers: Harnessing Pre-trained Generalist Agents for Software Engineering Tasks

Harnessing Pre-trained Generalist Agents for Software Engineering Tasks

URL: http://arxiv.org/abs/2312.15536v1
Date: Sun, 24 Dec 2023 18:39:58 GMT
Title: Harnessing Pre-trained Generalist Agents for Software Engineering Tasks
Authors: Paulina Stevia Nouwou Mindom, Amin Nikanjam, Foutse Khomh
Abstract summary: Deep reinforcement learning (DRL) has been successfully used for automation in complex tasks such as game testing and solving the job-shop scheduling problem. Specialist DRL agents suffer from a lack of generalizability to other tasks and need substantial time to be developed and re-trained effectively. Recently, DRL researchers have begun to develop generalist agents, able to learn a policy from various environments and capable of achieving performances similar to or better than specialist agents in new tasks.
Score: 13.733085206098258
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Nowadays, we are witnessing an increasing adoption of Artificial Intelligence (AI) to develop techniques aimed at improving the reliability, effectiveness, and overall quality of software systems. Deep reinforcement learning (DRL) has recently been successfully used for automation in complex tasks such as game testing and solving the job-shop scheduling problem. However, these specialized DRL agents, trained from scratch on specific tasks, suffer from a lack of generalizability to other tasks and they need substantial time to be developed and re-trained effectively. Recently, DRL researchers have begun to develop generalist agents, able to learn a policy from various environments and capable of achieving performances similar to or better than specialist agents in new tasks. In the Natural Language Processing or Computer Vision domain, these generalist agents are showing promising adaptation capabilities to never-before-seen tasks after a light fine-tuning phase and achieving high performance. This paper investigates the potential of generalist agents for solving SE tasks. Specifically, we conduct an empirical study aimed at assessing the performance of two generalist agents on two important SE tasks: the detection of bugs in games (for two games) and the minimization of makespan in a scheduling task, to solve the job-shop scheduling problem (for two instances). Our results show that the generalist agents outperform the specialist agents with very little effort for fine-tuning, achieving a 20% reduction of the makespan over specialized agent performance on task-based scheduling. In the context of game testing, some generalist agent configurations detect 85% more bugs than the specialist agents. Building on our analysis, we provide recommendations for researchers and practitioners looking to select generalist agents for SE tasks, to ensure that they perform effectively.

Related papers

Predicting Multi-Agent Specialization via Task Parallelizability [4.9553580237478]
We show that specialist teams outperform generalist ones when environmental constraints limit task parallelizability. We also observe that as the state space expands, agents tend to converge on specialist strategies, even when generalist ones are theoretically more efficient. Our findings provide a principled framework for interpreting specialization given the task and environment.
arXiv Detail & Related papers (2025-03-19T21:33:48Z)
ML Research Benchmark [0.0]
We present the ML Research Benchmark (MLRB), comprising 7 competition-level tasks derived from recent machine learning conference tracks. This paper introduces a novel benchmark and evaluates it using agent scaffolds powered by frontier models, including Claude-3 and GPT-4o. The results indicate that the Claude-3.5 Sonnet agent performs best across our benchmark, excelling in planning and developing machine learning models.
arXiv Detail & Related papers (2024-10-29T21:38:42Z)
Gödel Agent: A Self-Referential Agent Framework for Recursive Self-Improvement [117.94654815220404]
G"odel Agent is a self-evolving framework inspired by the G"odel machine. G"odel Agent can achieve continuous self-improvement, surpassing manually crafted agents in performance, efficiency, and generalizability.
arXiv Detail & Related papers (2024-10-06T10:49:40Z)
Agent-Oriented Planning in Multi-Agent Systems [54.429028104022066]
We propose a novel framework for agent-oriented planning in multi-agent systems, leveraging a fast task decomposition and allocation process. We integrate a feedback loop into the proposed framework to further enhance the effectiveness and robustness of such a problem-solving process.
arXiv Detail & Related papers (2024-10-03T04:07:51Z)
ExACT: Teaching AI Agents to Explore with Reflective-MCTS and Exploratory Learning [78.42927884000673]
ExACT is an approach to combine test-time search and self-learning to build o1-like models for agentic applications. We first introduce Reflective Monte Carlo Tree Search (R-MCTS), a novel test time algorithm designed to enhance AI agents' ability to explore decision space on the fly. Next, we introduce Exploratory Learning, a novel learning strategy to teach agents to search at inference time without relying on any external search algorithms.
arXiv Detail & Related papers (2024-10-02T21:42:35Z)
CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark [11.794931453828974]
CORE-Bench is a benchmark consisting of 270 tasks based on 90 scientific papers across three disciplines (computer science, social science, and medicine) We provide an evaluation system to measure the accuracy of agents in a fast and parallelizable way. The best agent achieved an accuracy of 21% on the hardest task, showing the vast scope for improvement in automating routine scientific tasks.
arXiv Detail & Related papers (2024-09-17T17:13:19Z)
RL-GPT: Integrating Reinforcement Learning and Code-as-policy [82.1804241891039]
We introduce a two-level hierarchical framework, RL-GPT, comprising a slow agent and a fast agent. The slow agent analyzes actions suitable for coding, while the fast agent executes coding tasks. This decomposition effectively focuses each agent on specific tasks, proving highly efficient within our pipeline.
arXiv Detail & Related papers (2024-02-29T16:07:22Z)
Human-Timescale Adaptation in an Open-Ended Task Space [56.55530165036327]
We show that training an RL agent at scale leads to a general in-context learning algorithm that can adapt to open-ended novel embodied 3D problems as quickly as humans. Our results lay the foundation for increasingly general and adaptive RL agents that perform well across ever-larger open-ended domains.
arXiv Detail & Related papers (2023-01-18T15:39:21Z)
Generalizing to New Tasks via One-Shot Compositional Subgoals [23.15624959305799]
The ability to generalize to previously unseen tasks with little to no supervision is a key challenge in modern machine learning research. We introduce CASE which attempts to address these issues by training an Imitation Learning agent using adaptive "near future" subgoals. Our experiments show that the proposed approach consistently outperforms the previous state-of-the-art compositional Imitation Learning approach by 30%.
arXiv Detail & Related papers (2022-05-16T14:30:11Z)
Robust Reinforcement Learning via Genetic Curriculum [5.421464476555662]
Genetic curriculum is an algorithm that automatically identifies scenarios in which the agent currently fails and generates an associated curriculum. Our empirical studies show improvement in robustness over the existing state of the art algorithms, providing training curricula that result in agents being 2 - 8x times less likely to fail.
arXiv Detail & Related papers (2022-02-17T01:14:20Z)
Planning to Explore via Self-Supervised World Models [120.31359262226758]
Plan2Explore is a self-supervised reinforcement learning agent. We present a new approach to self-supervised exploration and fast adaptation to new tasks. Without any training supervision or task-specific interaction, Plan2Explore outperforms prior self-supervised exploration methods.
arXiv Detail & Related papers (2020-05-12T17:59:45Z)
Trying AGAIN instead of Trying Longer: Prior Learning for Automatic Curriculum Learning [39.489869446313065]
A major challenge in the Deep RL (DRL) community is to train agents able to generalize over unseen situations. We propose a two stage ACL approach where 1) a teacher algorithm first learns to train a DRL agent with a high-exploration curriculum, and then 2) distills learned priors from the first run to generate an "expert curriculum" Besides demonstrating 50% improvements on average over the current state of the art, the objective of this work is to give a first example of a new research direction oriented towards refining ACL techniques over multiple learners.
arXiv Detail & Related papers (2020-04-07T07:30:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.