Related papers: VAPU: System for Autonomous Legacy Code Modernization

VAPU: System for Autonomous Legacy Code Modernization

URL: http://arxiv.org/abs/2510.18509v1
Date: Tue, 21 Oct 2025 10:50:33 GMT
Title: VAPU: System for Autonomous Legacy Code Modernization
Authors: Valtteri Ala-Salmi, Zeeshan Rasheed, Abdul Malik Sami, Muhammad Waseem, Kai-Kristian Kemell, Jussi Rasku, Mika Saari, Pekka Abrahamsson,
Abstract summary: We propose a multi-agent system named VAPU, which is designed to update code files in phases while simulating different roles in a software development team.<n>VAPU showed up to 22.5% increase in the succeeding Python file update requirements compared to ZSL/OSL prompts.<n>The study indicates that an LLM-based multi-agent system is a capable solution to update components of a legacy application autonomously.
Score: 2.0177617569743607
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this study, we present a solution for the modernization of legacy applications, an area of code generation where LLM-based multi-agent systems are proving essential for complex multi-phased tasks. Legacy applications often contain deprecated components that create compatibility, security, and reliability risks, but high resource costs make companies hesitate to update. We take a step forward to integrate an LLM-based multi-agent system as part of a legacy web application update to provide a cost-effective solution to update legacy applications autonomously. We propose a multi-agent system named a Verifying Agent Pipeline Updater (VAPU), which is designed to update code files in phases while simulating different roles in a software development team. In our previous study, we evaluated the system for legacy version updates by using six legacy web application view files by resulting errors and accomplished requirements. This study extends the previous evaluation of a multi-agent pipeline system by extending the evaluation of VAPU from a single LLM to five LLMs and using the temperature parameter in both 0 to 1 settings. Additionally, we tested the system with 20 open-source Python GitHub projects. The results of the evaluation were compared to Zero-Shot Learning (ZSL) and One-Shot Learning (OSL) prompts. The extended evaluation of VAPU showed that particularly in a low-temperature VAPU can get similar level of error count compared to the ZSL/OSL prompts but with a higher level of fulfilled requirements, depending on the LLM. VAPU showed up to 22.5% increase in the succeeding Python file update requirements compared to ZSL/OSL prompts. The study indicates that an LLM-based multi-agent system is a capable solution to update components of a legacy application autonomously.

Related papers

Toward Training Superintelligent Software Agents through Self-Play SWE-RL [66.11447353341926]
Self-play SWE-RL is a first step toward training paradigms for superintelligent software agents.<n>Our approach takes minimal data assumptions, only requiring access to sandboxed repositories with source code and installed dependencies.<n>Our results, albeit early, suggest a path where agents autonomously gather extensive learning experiences from real-world software repositories.
arXiv Detail & Related papers (2025-12-21T00:49:40Z)
VeruSAGE: A Study of Agent-Based Verification for Rust Systems [3.268641886777319]
This paper offers a comprehensive study of large language models' capability to develop correctness proofs for system software written in Rust.<n>We curate a new system-verification benchmark suite, VeruSAGE-Bench, which consists of 849 proof tasks extracted from eight open-source Verus-verified Rust systems.<n>The best LLM-agent combination in our study completes over 80% of system-verification tasks in VeruSAGE-Bench.
arXiv Detail & Related papers (2025-12-20T17:22:52Z)
Multi-Agent Systems for Dataset Adaptation in Software Engineering: Capabilities, Limitations, and Future Directions [8.97512410819274]
This paper presents the first empirical study on how state-of-the-art multi-agent systems perform in dataset adaptation tasks.<n>We evaluate GitHub Copilot on adapting SE research artifacts from benchmark repositories including ROCODE and LogHub2.0.<n>Results show that current systems can identify key files and generate partial adaptations but rarely produce correct implementations.
arXiv Detail & Related papers (2025-11-26T13:26:11Z)
Reinforcement Learning for Machine Learning Engineering Agents [52.03168614623642]
We show that agents backed by weaker models that improve via reinforcement learning can outperform agents backed by much larger, but static models.<n>We propose duration- aware gradient updates in a distributed asynchronous RL framework to amplify high-cost but high-reward actions.<n>We also propose environment instrumentation to offer partial credit, distinguishing almost-correct programs from those that fail early.
arXiv Detail & Related papers (2025-09-01T18:04:10Z)
Reinforcement Learning for Long-Horizon Interactive LLM Agents [56.9860859585028]
Interactive digital agents (IDAs) leverage APIs of stateful digital environments to perform tasks in response to user requests.<n>We present a reinforcement learning (RL) approach that trains IDAs directly in their target environments.<n>We derive LOOP, a data- and memory-efficient variant of proximal policy optimization.
arXiv Detail & Related papers (2025-02-03T18:35:42Z)
Autonomous Legacy Web Application Upgrades Using a Multi-Agent System [3.456157428615978]
Large Language Models (LLMs) for autonomous code generation is gaining attention in emerging technologies.<n>Many outdated web applications pose security and reliability challenges, yet companies continue using them due to the complexity and cost of upgrades.<n>We propose an LLM-based multi-agent system that autonomously upgrades legacy web applications to the latest versions.
arXiv Detail & Related papers (2025-01-31T15:14:14Z)
RES-Q: Evaluating Code-Editing Large Language Model Systems at the Repository Scale [3.378738346115004]
We develop RES-Q, a benchmark for evaluating Large Language Models (LLMs) We evaluate various state-of-the-art LLMs as language agents in a repository-editing system built on Qurrent OS.
arXiv Detail & Related papers (2024-06-24T17:08:17Z)
VersiCode: Towards Version-controllable Code Generation [58.82709231906735]
Large Language Models (LLMs) have made tremendous strides in code generation, but existing research fails to account for the dynamic nature of software development. We propose two novel tasks aimed at bridging this gap: version-specific code completion (VSCC) and version-aware code migration (VACM) We conduct an extensive evaluation on VersiCode, which reveals that version-controllable code generation is indeed a significant challenge.
arXiv Detail & Related papers (2024-06-11T16:15:06Z)
SearchLVLMs: A Plug-and-Play Framework for Augmenting Large Vision-Language Models by Searching Up-to-Date Internet Knowledge [56.772051051558215]
Large vision-language models (LVLMs) are ignorant of the up-to-date knowledge, such as LLaVA series, because they cannot be updated frequently. We propose a plug-and-play framework, for augmenting existing LVLMs in handling visual question answering (VQA) about up-to-date knowledge, dubbed SearchLVLMs.
arXiv Detail & Related papers (2024-05-23T13:32:07Z)
ML-Bench: Evaluating Large Language Models and Agents for Machine Learning Tasks on Repository-Level Code [76.84199699772903]
ML-Bench is a benchmark rooted in real-world programming applications that leverage existing code repositories to perform tasks. To evaluate both Large Language Models (LLMs) and AI agents, two setups are employed: ML-LLM-Bench for assessing LLMs' text-to-code conversion within a predefined deployment environment, and ML-Agent-Bench for testing autonomous agents in an end-to-end task execution within a Linux sandbox environment.
arXiv Detail & Related papers (2023-11-16T12:03:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.