Related papers: Lingma SWE-GPT: An Open Development-Process-Centric Language Model for Automated Software Improvement

Lingma SWE-GPT: An Open Development-Process-Centric Language Model for Automated Software Improvement

URL: http://arxiv.org/abs/2411.00622v1
Date: Fri, 01 Nov 2024 14:27:16 GMT
Title: Lingma SWE-GPT: An Open Development-Process-Centric Language Model for Automated Software Improvement
Authors: Yingwei Ma, Rongyu Cao, Yongchang Cao, Yue Zhang, Jue Chen, Yibo Liu, Yuchen Liu, Binhua Li, Fei Huang, Yongbin Li,
Abstract summary: Lingma SWE-GPT series learns from and simulating real-world code submission activities. Lingma SWE-GPT 72B resolves 30.20% of GitHub issues, marking a significant improvement in automatic issue resolution.
Score: 62.94719119451089
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advancements in LLM-based agents have led to significant progress in automatic software engineering, particularly in software maintenance and evolution. Despite these encouraging advances, current research faces two major challenges. First, SOTA performance primarily depends on closed-source models, which significantly limits the technology's accessibility, and potential for customization in diverse SE tasks. Second, these models are predominantly trained on static code data, lacking a deep understanding of the dynamic interactions, iterative problem-solving processes, and evolutionary characteristics inherent in software development. To address these challenges, our study adopts a software engineering perspective. We recognize that real-world software maintenance and evolution processes encompass not only static code data but also developers' thought processes, utilization of external tools, and the interaction between different functional personnel. Consequently, we introduce the Lingma SWE-GPT series, comprising Lingma SWE-GPT 7B and 72B. By learning from and simulating real-world code submission activities, Lingma SWE-GPT systematically incorporates the dynamic interactions and iterative problem-solving inherent in software development process, thereby achieving a more comprehensive understanding of software improvement processes. We conducted experimental evaluations using SWE-bench Verified benchmark. The results demonstrate that Lingma SWE-GPT 72B successfully resolves 30.20% of the GitHub issues, marking a significant improvement in automatic issue resolution (22.76% relative improvement compared to Llama 3.1 405B), approaching the performance of closed-source models (31.80\% issues of GPT-4o resolved). Notably, Lingma SWE-GPT 7B resolves 18.20% of the issues, highlighting the potential for applying smaller models to ASE tasks.

Related papers

Machine Learning Pipeline for Software Engineering: A Systematic Literature Review [0.0]
This systematic literature review examines state-of-the-art Machine Learning pipelines designed for software engineering (SE)<n>Our findings show that robust preprocessing, such as SMOTE for data balancing, improves model reliability.<n> Ensemble methods like Random Forest and Gradient Boosting dominate performance across tasks.<n>New metrics like Best Arithmetic Mean (BAM) are emerging in niche applications.
arXiv Detail & Related papers (2025-07-31T15:37:30Z)
ORMind: A Cognitive-Inspired End-to-End Reasoning Framework for Operations Research [53.736407871322314]
We introduce ORMind, a cognitive-inspired framework that enhances optimization through counterfactual reasoning.<n>Our approach emulates human cognition, implementing an end-to-end workflow that transforms requirements into mathematical models and executable code.<n>It is currently being tested internally in Lenovo's AI Assistant, with plans to enhance optimization capabilities for both business and consumer customers.
arXiv Detail & Related papers (2025-06-02T05:11:21Z)
MLE-Dojo: Interactive Environments for Empowering LLM Agents in Machine Learning Engineering [57.156093929365255]
Gym-style framework for systematically reinforcement learning, evaluating, and improving autonomous large language model (LLM) agents.<n>MLE-Dojo covers diverse, open-ended MLE tasks carefully curated to reflect realistic engineering scenarios.<n>Its fully executable environment supports comprehensive agent training via both supervised fine-tuning and reinforcement learning.
arXiv Detail & Related papers (2025-05-12T17:35:43Z)
A Path Less Traveled: Reimagining Software Engineering Automation via a Neurosymbolic Paradigm [9.900581015679935]
We propose Neurosymbolic Software Engineering as a promising paradigm combining neural learning with symbolic (rule-based) reasoning.<n>This hybrid methodology aims to enhance efficiency, reliability, and transparency in AI-driven software engineering.
arXiv Detail & Related papers (2025-05-04T22:10:21Z)
Thinking Longer, Not Larger: Enhancing Software Engineering Agents via Scaling Test-Time Compute [61.00662702026523]
We propose a unified Test-Time Compute scaling framework that leverages increased inference-time instead of larger models. Our framework incorporates two complementary strategies: internal TTC and external TTC. We demonstrate our textbf32B model achieves a 46% issue resolution rate, surpassing significantly larger models such as DeepSeek R1 671B and OpenAI o1.
arXiv Detail & Related papers (2025-03-31T07:31:32Z)
SEAlign: Alignment Training for Software Engineering Agent [38.05820118124528]
We propose SEAlign to bridge the gap between code generation models and real-world software development tasks. We evaluate SEAlign on three standard agentic benchmarks for real-world software engineering, including HumanEvalFix, SWE-Bench-Lite, and SWE-Bench-Verified. We develop an agent-based software development platform using SEAlign, which successfully automates the creation of several small applications.
arXiv Detail & Related papers (2025-03-24T08:59:21Z)
Interactive Agents to Overcome Ambiguity in Software Engineering [61.40183840499932]
AI agents are increasingly being deployed to automate tasks, often based on ambiguous and underspecified user instructions. Making unwarranted assumptions and failing to ask clarifying questions can lead to suboptimal outcomes. We study the ability of LLM agents to handle ambiguous instructions in interactive code generation settings by evaluating proprietary and open-weight models on their performance.
arXiv Detail & Related papers (2025-02-18T17:12:26Z)
Intelligent Mobile AI-Generated Content Services via Interactive Prompt Engineering and Dynamic Service Provisioning [55.641299901038316]
AI-generated content can organize collaborative Mobile AIGC Service Providers (MASPs) at network edges to provide ubiquitous and customized content for resource-constrained users. Such a paradigm faces two significant challenges: 1) raw prompts often lead to poor generation quality due to users' lack of experience with specific AIGC models, and 2) static service provisioning fails to efficiently utilize computational and communication resources. We develop an interactive prompt engineering mechanism that leverages a Large Language Model (LLM) to generate customized prompt corpora and employs Inverse Reinforcement Learning (IRL) for policy imitation.
arXiv Detail & Related papers (2025-02-17T03:05:20Z)
SWE-Search: Enhancing Software Agents with Monte Carlo Tree Search and Iterative Refinement [18.84439000902905]
SWE-Search is a multi-agent framework that integrates Monte Carlo Tree Search (MCTS) with a self-improvement mechanism to enhance software agents' performance. This work highlights the potential of self-evaluation driven search techniques to enhance agent reasoning and planning in complex, dynamic software engineering environments.
arXiv Detail & Related papers (2024-10-26T22:45:56Z)
Think-on-Process: Dynamic Process Generation for Collaborative Development of Multi-Agent System [13.65717444483291]
ToP (Think-on-Process) is a dynamic process generation framework for software development. Our framework significantly enhances the dynamic process generation capability of the GPT-3.5 and GPT-4.
arXiv Detail & Related papers (2024-09-10T15:02:34Z)
Building Math Agents with Multi-Turn Iterative Preference Learning [56.71330214021884]
This paper studies the complementary direct preference learning approach to further improve model performance. Existing direct preference learning algorithms are originally designed for the single-turn chat task. We introduce a multi-turn direct preference learning framework, tailored for this context.
arXiv Detail & Related papers (2024-09-04T02:41:04Z)
Agent-Driven Automatic Software Improvement [55.2480439325792]
This research proposal aims to explore innovative solutions by focusing on the deployment of agents powered by Large Language Models (LLMs) The iterative nature of agents, which allows for continuous learning and adaptation, can help surpass common challenges in code generation. We aim to use the iterative feedback in these systems to further fine-tune the LLMs underlying the agents, becoming better aligned to the task of automated software improvement.
arXiv Detail & Related papers (2024-06-24T15:45:22Z)
Developers' Perceptions on the Impact of ChatGPT in Software Development: A Survey [13.257222195239375]
We conducted a survey with 207 software developers to understand the impact of ChatGPT on software quality, productivity, and job satisfaction. The study delves into developers' expectations regarding future adaptations of ChatGPT, concerns about potential job displacement, and perspectives on regulatory interventions.
arXiv Detail & Related papers (2024-05-20T17:31:16Z)
SOEN-101: Code Generation by Emulating Software Process Models Using Large Language Model Agents [50.82665351100067]
FlowGen is a code generation framework that emulates software process models based on multiple Large Language Model (LLM) agents. We evaluate FlowGenScrum on four benchmarks: HumanEval, HumanEval-ET, MBPP, and MBPP-ET.
arXiv Detail & Related papers (2024-03-23T14:04:48Z)
Machine Learning Application Development: Practitioners' Insights [18.114724750441724]
We report about a survey that aimed to understand the challenges and best practices of ML application development. We synthesize the results obtained from 80 practitioners into 17 findings; outlining challenges and best practices for ML application development. We hope that the reported challenges will inform the research community about topics that need to be investigated to improve the engineering process and the quality of ML-based applications.
arXiv Detail & Related papers (2021-12-31T03:38:37Z)
Data-Driven and SE-assisted AI Model Signal-Awareness Enhancement and Introspection [61.571331422347875]
We propose a data-driven approach to enhance models' signal-awareness. We combine the SE concept of code complexity with the AI technique of curriculum learning. We achieve up to 4.8x improvement in model signal awareness.
arXiv Detail & Related papers (2021-11-10T17:58:18Z)
Technology Readiness Levels for AI & ML [79.22051549519989]
Development of machine learning systems can be executed easily with modern tools, but the process is typically rushed and means-to-an-end. Engineering systems follow well-defined processes and testing standards to streamline development for high-quality, reliable results. We propose a proven systems engineering approach for machine learning development and deployment.
arXiv Detail & Related papers (2020-06-21T17:14:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.