Related papers: Human-In-the-Loop Software Development Agents

Related papers

From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence [150.3696990310269]
Large language models (LLMs) have transformed automated software development by enabling direct translation of natural language descriptions into functional code.<n>We provide a comprehensive synthesis and practical guide (a series of analytic and probing experiments) about code LLMs.<n>We analyze the code capability of the general LLMs (GPT-4, Claude, LLaMA) and code-specialized LLMs (StarCoder, Code LLaMA, DeepSeek-Coder, and QwenCoder)
arXiv Detail & Related papers (2025-11-23T17:09:34Z)
A Survey of Vibe Coding with Large Language Models [93.88284590533242]
"Vibe Coding" is a development methodology where developers validate AI-generated implementations through outcome observation.<n>Despite its transformative potential, the effectiveness of this emergent paradigm remains under-explored.<n>This survey provides the first comprehensive and systematic review of Vibe Coding with large language models.
arXiv Detail & Related papers (2025-10-14T11:26:56Z)
ALMAS: an Autonomous LLM-based Multi-Agent Software Engineering Framework [5.920182474293131]
We propose a vision for ALMAS, an Autonomous LLM-based Multi-Agent Software Engineering framework.<n>We showcase the progress towards ALMAS through our published works and a use case demonstrating the framework.
arXiv Detail & Related papers (2025-10-03T19:35:23Z)
On Integrating Large Language Models and Scenario-Based Programming for Improving Software Reliability [2.2058293096044586]
Large Language Models (LLMs) are fast becoming indispensable tools for software developers.<n>LLMs often introduce significant errors and present incorrect code with persuasive confidence.<n>We propose a methodology for combining LLMs with traditional'' software engineering techniques in a structured way.
arXiv Detail & Related papers (2025-09-11T07:10:25Z)
A Survey on Code Generation with LLM-based Agents [61.474191493322415]
Code generation agents powered by large language models (LLMs) are revolutionizing the software development paradigm.<n>LLMs are characterized by three core features.<n>This paper presents a systematic survey of the field of LLM-based code generation agents.
arXiv Detail & Related papers (2025-07-31T18:17:36Z)
MERA Code: A Unified Framework for Evaluating Code Generation Across Tasks [56.34018316319873]
We propose MERA Code, a benchmark for evaluating code for the latest code generation LLMs in Russian.<n>This benchmark includes 11 evaluation tasks that span 8 programming languages.<n>We evaluate open LLMs and frontier API models, analyzing their limitations in terms of practical coding tasks in non-English languages.
arXiv Detail & Related papers (2025-07-16T14:31:33Z)
Unified Software Engineering agent as AI Software Engineer [14.733475669942276]
Large Language Model (LLM) technology has raised expectations for automated coding.<n>In this paper, we seek to understand this question by developing a Unified Software Engineering agent or USEagent.<n>We envision USEagent as the first draft of a future AI Software Engineer which can be a team member in future software development teams involving both AI and humans.
arXiv Detail & Related papers (2025-06-17T16:19:13Z)
Human-In-The-Loop Software Development Agents: Challenges and Future Directions [14.81934634773595]
At Atlassian, we deployed Human-in-the-Loop Software Development Agents to resolve Jira work items and evaluated the generated code quality using functional correctness testing and GPT-based similarity scoring.<n>This paper highlights two major challenges: the high computational costs of unit testing and the variability in LLM-based evaluations.
arXiv Detail & Related papers (2025-04-25T01:52:59Z)
LLMs' Reshaping of People, Processes, Products, and Society in Software Development: A Comprehensive Exploration with Early Adopters [3.4069804433026314]
Large language models (LLMs) like OpenAI ChatGPT, Google Gemini, and GitHub Copilot are rapidly gaining traction in the software industry. Our study provides a nuanced understanding of how LLMs are shaping the landscape of software development.
arXiv Detail & Related papers (2025-03-06T22:27:05Z)
Every Software as an Agent: Blueprint and Case Study [0.6655461660736298]
We advocate to endow large language models (LLMs) with access to the software internals (source code and runtime context) and the permission to dynamically inject generated code into software for execution. We present an overall design architecture and case studies on two popular web-based desktop applications.
arXiv Detail & Related papers (2025-02-07T08:29:09Z)
Think-on-Process: Dynamic Process Generation for Collaborative Development of Multi-Agent System [13.65717444483291]
ToP (Think-on-Process) is a dynamic process generation framework for software development. Our framework significantly enhances the dynamic process generation capability of the GPT-3.5 and GPT-4.
arXiv Detail & Related papers (2024-09-10T15:02:34Z)
WorkArena++: Towards Compositional Planning and Reasoning-based Common Knowledge Work Tasks [85.95607119635102]
Large language models (LLMs) can mimic human-like intelligence. WorkArena++ is designed to evaluate the planning, problem-solving, logical/arithmetic reasoning, retrieval, and contextual understanding abilities of web agents.
arXiv Detail & Related papers (2024-07-07T07:15:49Z)
Agentless: Demystifying LLM-based Software Engineering Agents [12.19683999553113]
We build Agentless -- an agentless approach to automatically solve software development problems. Compared to the verbose and complex setup of agent-based approaches, Agentless employs a simplistic three-phase process of localization, repair, and patch validation. Our results on the popular SWE-bench Lite benchmark show that surprisingly the simplistic Agentless is able to achieve both the highest performance and low cost.
arXiv Detail & Related papers (2024-07-01T17:24:45Z)
Agent-Driven Automatic Software Improvement [55.2480439325792]
This research proposal aims to explore innovative solutions by focusing on the deployment of agents powered by Large Language Models (LLMs) The iterative nature of agents, which allows for continuous learning and adaptation, can help surpass common challenges in code generation. We aim to use the iterative feedback in these systems to further fine-tune the LLMs underlying the agents, becoming better aligned to the task of automated software improvement.
arXiv Detail & Related papers (2024-06-24T15:45:22Z)
DevBench: A Comprehensive Benchmark for Software Development [72.24266814625685]
DevBench is a benchmark that evaluates large language models (LLMs) across various stages of the software development lifecycle. Empirical studies show that current LLMs, including GPT-4-Turbo, fail to solve the challenges presented within DevBench. Our findings offer actionable insights for the future development of LLMs toward real-world programming applications.
arXiv Detail & Related papers (2024-03-13T15:13:44Z)
Copilot Evaluation Harness: Evaluating LLM-Guided Software Programming [12.355284125578342]
Large Language Models (LLMs) have become a focal point in modern software development. LLMs offer the potential to significantly augment developer productivity by serving as intelligent, chat-driven programming assistants. However, each system requires the LLM to be honed to its set of workspaces to ensure the best performance.
arXiv Detail & Related papers (2024-02-22T03:51:34Z)
CodePori: Large-Scale System for Autonomous Software Development Using Multi-Agent Technology [4.2990995991059275]
Large Language Models (LLMs) and Generative Pre-trained Transformers (GPTs) have transformed the field of Software Engineering. We introduce CodePori, a novel system designed to automate code generation for large and complex software projects. Results: CodePori is able to generate running code for large-scale projects, aligned with the typical software development process.
arXiv Detail & Related papers (2024-02-02T13:42:50Z)
Experiential Co-Learning of Software-Developing Agents [83.34027623428096]
Large language models (LLMs) have brought significant changes to various domains, especially in software development. We introduce Experiential Co-Learning, a novel LLM-agent learning framework. Experiments demonstrate that the framework enables agents to tackle unseen software-developing tasks more effectively.
arXiv Detail & Related papers (2023-12-28T13:50:42Z)
SoTaNa: The Open-Source Software Development Assistant [81.86136560157266]
SoTaNa is an open-source software development assistant. It generates high-quality instruction-based data for the domain of software engineering. It employs a parameter-efficient fine-tuning approach to enhance the open-source foundation model, LLaMA.
arXiv Detail & Related papers (2023-08-25T14:56:21Z)
Technology Readiness Levels for Machine Learning Systems [107.56979560568232]
Development and deployment of machine learning systems can be executed easily with modern tools, but the process is typically rushed and means-to-an-end. We have developed a proven systems engineering approach for machine learning development and deployment. Our "Machine Learning Technology Readiness Levels" framework defines a principled process to ensure robust, reliable, and responsible systems.
arXiv Detail & Related papers (2021-01-11T15:54:48Z)
Technology Readiness Levels for AI & ML [79.22051549519989]
Development of machine learning systems can be executed easily with modern tools, but the process is typically rushed and means-to-an-end. Engineering systems follow well-defined processes and testing standards to streamline development for high-quality, reliable results. We propose a proven systems engineering approach for machine learning development and deployment.
arXiv Detail & Related papers (2020-06-21T17:14:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.