Related papers: SEAlign: Alignment Training for Software Engineering Agent

SEAlign: Alignment Training for Software Engineering Agent

URL: http://arxiv.org/abs/2503.18455v1
Date: Mon, 24 Mar 2025 08:59:21 GMT
Title: SEAlign: Alignment Training for Software Engineering Agent
Authors: Kechi Zhang, Huangzhao Zhang, Ge Li, Jinliang You, Jia Li, Yunfei Zhao, Zhi Jin,
Abstract summary: We propose SEAlign to bridge the gap between code generation models and real-world software development tasks.<n>We evaluate SEAlign on three standard agentic benchmarks for real-world software engineering, including HumanEvalFix, SWE-Bench-Lite, and SWE-Bench-Verified.<n>We develop an agent-based software development platform using SEAlign, which successfully automates the creation of several small applications.
Score: 38.05820118124528
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances in code generation models have demonstrated impressive capabilities in automating software development tasks, yet these models still struggle in real-world software engineering scenarios. Although current training methods, particularly post-training, excel at solving competitive programming problems, they fail to adequately prepare models for the complexities of practical software development. This misalignment raises the critical question: Are existing alignment training methods well suited for real-world software engineering tasks? In this study, we identify this issue and propose SEAlign, a novel alignment framework designed to bridge the gap between code generation models and real-world software development tasks. SEAlign leverages the unique characteristics of software engineering processes, including high-quality workflow steps, to enhance model capabilities. Our framework further employs Monte Carlo Tree Search for fine-grained alignment in multi-step decision processes, followed by preference optimization on critical actions to ensure models meet real-world requirements. We evaluate SEAlign on three standard agentic benchmarks for real-world software engineering, including HumanEvalFix, SWE-Bench-Lite, and SWE-Bench-Verified. Experimental results demonstrate state-of-the-art performance with minimal training overhead. In addition, we develop an agent-based software development platform using SEAlign, which successfully automates the creation of several small applications. Human evaluations of these applications highlight significant improvements in both task performance and user experience. Our findings underscore the potential of SEAlign to accelerate the adoption of large code models in real-world software development. We believe that this research makes a meaningful step towards fully automated software engineering.

Related papers

Machine Learning Pipeline for Software Engineering: A Systematic Literature Review [0.0]
This systematic literature review examines state-of-the-art Machine Learning pipelines designed for software engineering (SE)<n>Our findings show that robust preprocessing, such as SMOTE for data balancing, improves model reliability.<n> Ensemble methods like Random Forest and Gradient Boosting dominate performance across tasks.<n>New metrics like Best Arithmetic Mean (BAM) are emerging in niche applications.
arXiv Detail & Related papers (2025-07-31T15:37:30Z)
A Path Less Traveled: Reimagining Software Engineering Automation via a Neurosymbolic Paradigm [9.900581015679935]
We propose Neurosymbolic Software Engineering as a promising paradigm combining neural learning with symbolic (rule-based) reasoning.<n>This hybrid methodology aims to enhance efficiency, reliability, and transparency in AI-driven software engineering.
arXiv Detail & Related papers (2025-05-04T22:10:21Z)
Thinking Longer, Not Larger: Enhancing Software Engineering Agents via Scaling Test-Time Compute [61.00662702026523]
We propose a unified Test-Time Compute scaling framework that leverages increased inference-time instead of larger models. Our framework incorporates two complementary strategies: internal TTC and external TTC. We demonstrate our textbf32B model achieves a 46% issue resolution rate, surpassing significantly larger models such as DeepSeek R1 671B and OpenAI o1.
arXiv Detail & Related papers (2025-03-31T07:31:32Z)
SENAI: Towards Software Engineering Native Generative Artificial Intelligence [3.915435754274075]
This paper argues for the integration of Software Engineering knowledge into Large Language Models.<n>The aim is to propose a new direction where LLMs can move beyond mere functional accuracy to perform generative tasks.<n>Software engineering native generative models will not only overcome the shortcomings present in current models but also pave the way for the next generation of generative models capable of handling real-world software engineering.
arXiv Detail & Related papers (2025-03-19T15:02:07Z)
Human-In-the-Loop Software Development Agents [12.830816751625829]
Large Language Models (LLMs)-based multi-agent paradigms for software engineering are introduced to automatically resolve software development tasks.<n>In this paper, we introduce a Human-in-the-loop LLM-based Agents framework (HULA) for software development.<n>We design, implement, and deploy the HULA framework into Atlassian for internal uses.
arXiv Detail & Related papers (2024-11-19T23:22:33Z)
Lingma SWE-GPT: An Open Development-Process-Centric Language Model for Automated Software Improvement [62.94719119451089]
Lingma SWE-GPT series learns from and simulating real-world code submission activities. Lingma SWE-GPT 72B resolves 30.20% of GitHub issues, marking a significant improvement in automatic issue resolution.
arXiv Detail & Related papers (2024-11-01T14:27:16Z)
Next-Gen Software Engineering. Big Models for AI-Augmented Model-Driven Software Engineering [0.0]
The paper provides an overview of the current state of AI-augmented software engineering and develops a corresponding taxonomy, AI4SE. A vision of AI-assisted Big Models in SE is put forth, with the aim of capitalising on the advantages inherent to both approaches in the context of software development.
arXiv Detail & Related papers (2024-09-26T16:49:57Z)
Think-on-Process: Dynamic Process Generation for Collaborative Development of Multi-Agent System [13.65717444483291]
ToP (Think-on-Process) is a dynamic process generation framework for software development. Our framework significantly enhances the dynamic process generation capability of the GPT-3.5 and GPT-4.
arXiv Detail & Related papers (2024-09-10T15:02:34Z)
Agent-Driven Automatic Software Improvement [55.2480439325792]
This research proposal aims to explore innovative solutions by focusing on the deployment of agents powered by Large Language Models (LLMs) The iterative nature of agents, which allows for continuous learning and adaptation, can help surpass common challenges in code generation. We aim to use the iterative feedback in these systems to further fine-tune the LLMs underlying the agents, becoming better aligned to the task of automated software improvement.
arXiv Detail & Related papers (2024-06-24T15:45:22Z)
ChatDev: Communicative Agents for Software Development [84.90400377131962]
ChatDev is a chat-powered software development framework in which specialized agents are guided in what to communicate. These agents actively contribute to the design, coding, and testing phases through unified language-based communication.
arXiv Detail & Related papers (2023-07-16T02:11:34Z)
Technology Readiness Levels for Machine Learning Systems [107.56979560568232]
Development and deployment of machine learning systems can be executed easily with modern tools, but the process is typically rushed and means-to-an-end. We have developed a proven systems engineering approach for machine learning development and deployment. Our "Machine Learning Technology Readiness Levels" framework defines a principled process to ensure robust, reliable, and responsible systems.
arXiv Detail & Related papers (2021-01-11T15:54:48Z)
Software Effort Estimation using parameter tuned Models [1.9336815376402716]
The imprecision of the estimation is the reason for Project Failure. The greatest pitfall of the software industry was the fast-changing nature of software development. We need the development of useful models that accurately predict the cost of developing a software product.
arXiv Detail & Related papers (2020-08-25T15:18:59Z)
Technology Readiness Levels for AI & ML [79.22051549519989]
Development of machine learning systems can be executed easily with modern tools, but the process is typically rushed and means-to-an-end. Engineering systems follow well-defined processes and testing standards to streamline development for high-quality, reliable results. We propose a proven systems engineering approach for machine learning development and deployment.
arXiv Detail & Related papers (2020-06-21T17:14:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.