Investigating Software Aging in LLM-Generated Software Systems
- URL: http://arxiv.org/abs/2510.24188v1
- Date: Tue, 28 Oct 2025 08:50:24 GMT
- Title: Investigating Software Aging in LLM-Generated Software Systems
- Authors: César Santos, Ermeson Andrade, Roberto Natella,
- Abstract summary: This paper experimentally investigate the phenomenon of software aging in applications generated by Large Language Models (LLMs)<n>Using the Bolt platform and standardized prompts from Baxbench, we generated four service-oriented applications and subjected them to 50-hour load tests.<n>Results reveal significant evidence of software aging, including progressive memory growth, increased response time, and performance instability.
- Score: 2.241579575562525
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Automatically generated software, especially code produced by Large Language Models (LLMs), is increasingly adopted to accelerate development and reduce manual effort. However, little is known about the long-term reliability of such systems under sustained execution. In this paper, we experimentally investigate the phenomenon of software aging in applications generated by LLM-based tools. Using the Bolt platform and standardized prompts from Baxbench, we generated four service-oriented applications and subjected them to 50-hour load tests. Resource usage, response time, and throughput were continuously monitored to detect degradation patterns. The results reveal significant evidence of software aging, including progressive memory growth, increased response time, and performance instability across all applications. Statistical analyzes confirm these trends and highlight variability in the severity of aging according to the type of application. Our findings show the need to consider aging in automatically generated software and provide a foundation for future studies on mitigation strategies and long-term reliability evaluation.
Related papers
- Llama-based source code vulnerability detection: Prompt engineering vs Fine tuning [0.6588840794922407]
Large Language Models (LLMs) are considered among the most performant AI models to date.<n>We study their performance and apply different state-of-the-art techniques to enhance their effectiveness.<n>We leverage the recent open-source Llama-3.1 8B, with source code samples extracted from BigVul and PrimeVul datasets.
arXiv Detail & Related papers (2025-12-09T12:08:24Z) - Large Language Models for Unit Test Generation: Achievements, Challenges, and the Road Ahead [15.43943391801509]
Unit testing is an essential yet laborious technique for verifying software.<n>Large Language Models (LLMs) address this limitation by utilizing by leveraging their data-driven knowledge of code semantics and programming patterns.<n>This framework analyzes the literature regarding core generative strategies and a set of enhancement techniques.
arXiv Detail & Related papers (2025-11-26T13:30:11Z) - Training Language Models to Generate Quality Code with Program Analysis Feedback [66.0854002147103]
Code generation with large language models (LLMs) is increasingly adopted in production but fails to ensure code quality.<n>We propose REAL, a reinforcement learning framework that incentivizes LLMs to generate production-quality code.
arXiv Detail & Related papers (2025-05-28T17:57:47Z) - Thinking Longer, Not Larger: Enhancing Software Engineering Agents via Scaling Test-Time Compute [61.00662702026523]
We propose a unified Test-Time Compute scaling framework that leverages increased inference-time instead of larger models.<n>Our framework incorporates two complementary strategies: internal TTC and external TTC.<n>We demonstrate our textbf32B model achieves a 46% issue resolution rate, surpassing significantly larger models such as DeepSeek R1 671B and OpenAI o1.
arXiv Detail & Related papers (2025-03-31T07:31:32Z) - Breaking the Loop: Detecting and Mitigating Denial-of-Service Vulnerabilities in Large Language Models [4.046135652393372]
Large Language Models (LLMs) have significantly advanced text understanding and generation, becoming integral to applications across education, software development, healthcare, entertainment, and legal services.<n>Recurrent generation, where models repeatedly produce similar or identical outputs, causes increased latency and potential Denial-of-Service (DoS) vulnerabilities.<n>We propose RecurrentGenerator, a black-box evolutionary algorithm that efficiently identifies recurrent generation scenarios in prominent LLMs like LLama-3 and GPT-4o.
arXiv Detail & Related papers (2025-03-01T09:32:17Z) - Adversarial Reasoning at Jailbreaking Time [49.70772424278124]
Large language models (LLMs) are becoming more capable and widespread.<n>Recent advances in standardizing, measuring, and scaling test-time compute suggest new methodologies for optimizing models to achieve high performance on hard tasks.<n>In this paper, we apply these advances to the task of model jailbreaking: eliciting harmful responses from aligned LLMs.
arXiv Detail & Related papers (2025-02-03T18:59:01Z) - The Potential of LLMs in Automating Software Testing: From Generation to Reporting [0.0]
Manual testing, while effective, can be time consuming and costly, leading to an increased demand for automated methods.<n>Recent advancements in Large Language Models (LLMs) have significantly influenced software engineering.<n>This paper explores an agent-oriented approach to automated software testing, using LLMs to reduce human intervention and enhance testing efficiency.
arXiv Detail & Related papers (2024-12-31T02:06:46Z) - AutoPT: How Far Are We from the End2End Automated Web Penetration Testing? [54.65079443902714]
We introduce AutoPT, an automated penetration testing agent based on the principle of PSM driven by LLMs.
Our results show that AutoPT outperforms the baseline framework ReAct on the GPT-4o mini model.
arXiv Detail & Related papers (2024-11-02T13:24:30Z) - Agent-Driven Automatic Software Improvement [55.2480439325792]
This research proposal aims to explore innovative solutions by focusing on the deployment of agents powered by Large Language Models (LLMs)
The iterative nature of agents, which allows for continuous learning and adaptation, can help surpass common challenges in code generation.
We aim to use the iterative feedback in these systems to further fine-tune the LLMs underlying the agents, becoming better aligned to the task of automated software improvement.
arXiv Detail & Related papers (2024-06-24T15:45:22Z) - Characterization of Large Language Model Development in the Datacenter [55.9909258342639]
Large Language Models (LLMs) have presented impressive performance across several transformative tasks.
However, it is non-trivial to efficiently utilize large-scale cluster resources to develop LLMs.
We present an in-depth characterization study of a six-month LLM development workload trace collected from our GPU datacenter Acme.
arXiv Detail & Related papers (2024-03-12T13:31:14Z) - Analyzing the Influence of Processor Speed and Clock Speed on Remaining Useful Life Estimation of Software Systems [0.9831489366502301]
This research extends the analysis to assess how changes in environmental attributes, such as operating system and clock speed, affect RUL estimation in software.
Findings are rigorously validated using real performance data from controlled test beds and compared with predictive model-generated data.
This exploration yields actionable knowledge for software maintenance and optimization strategies.
arXiv Detail & Related papers (2023-09-22T04:46:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.