How Do Agentic AI Systems Address Performance Optimizations? A BERTopic-Based Analysis of Pull Requests
- URL: http://arxiv.org/abs/2512.24630v1
- Date: Wed, 31 Dec 2025 05:06:25 GMT
- Title: How Do Agentic AI Systems Address Performance Optimizations? A BERTopic-Based Analysis of Pull Requests
- Authors: Md Nahidul Islam Opu, Shahidul Islam, Muhammad Asaduzzaman, Shaiful Chowdhury,
- Abstract summary: We present an empirical study of performance-related pull requests generated by AI agents.<n>Our results show that AI agents apply performance optimizations across diverse layers of the software stack.
- Score: 1.423280626666929
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: LLM-based software engineering is influencing modern software development. In addition to correctness, prior studies have also examined the performance of software artifacts generated by AI agents. However, it is unclear how exactly the agentic AI systems address performance concerns in practice. In this paper, we present an empirical study of performance-related pull requests generated by AI agents. Using LLM-assisted detection and BERTopic-based topic modeling, we identified 52 performance-related topics grouped into 10 higher-level categories. Our results show that AI agents apply performance optimizations across diverse layers of the software stack and that the type of optimization significantly affects pull request acceptance rates and review times. We also found that performance optimization by AI agents primarily occurs during the development phase, with less focus on the maintenance phase. Our findings provide empirical evidence that can support the evaluation and improvement of agentic AI systems with respect to their performance optimization behaviors and review outcomes.
Related papers
- AI IDEs or Autonomous Agents? Measuring the Impact of Coding Agents on Software Development [12.50615284537175]
Large language model (LLM) based coding agents increasingly act as autonomous contributors that generate and merge pull requests.<n>We present a longitudinal causal study of agent adoption in open-source repositories using staggered difference-in-differences with matched controls.
arXiv Detail & Related papers (2026-01-20T04:51:56Z) - How Do Agents Perform Code Optimization? An Empirical Study [6.085146597426065]
We present the first empirical study comparing agent- and human-authored performance optimization commits.<n>We find that AI-authored performance PRs are less likely to include explicit performance validations than human-authored PRs.
arXiv Detail & Related papers (2025-12-25T18:20:25Z) - Let the Barbarians In: How AI Can Accelerate Systems Performance Research [80.43506848683633]
We term this iterative cycle of generation, evaluation, and refinement AI-Driven Research for Systems.<n>We demonstrate that ADRS-generated solutions can match or even outperform human state-of-the-art designs.
arXiv Detail & Related papers (2025-12-16T18:51:23Z) - SelfAI: Building a Self-Training AI System with LLM Agents [79.10991818561907]
SelfAI is a general multi-agent platform that combines a User Agent for translating high-level research objectives into standardized experimental configurations.<n>An Experiment Manager orchestrates parallel, fault-tolerant training across heterogeneous hardware while maintaining a structured knowledge base for continuous feedback.<n>Across regression, computer vision, scientific computing, medical imaging, and drug discovery benchmarks, SelfAI consistently achieves strong performance and reduces redundant trials.
arXiv Detail & Related papers (2025-11-29T09:18:39Z) - Barbarians at the Gate: How AI is Upending Systems Research [58.95406995634148]
We argue that systems research, long focused on designing and evaluating new performance-oriented algorithms, is particularly well-suited for AI-driven solution discovery.<n>We term this approach as AI-Driven Research for Systems ( ADRS), which iteratively generates, evaluates, and refines solutions.<n>Our results highlight both the disruptive potential and the urgent need to adapt systems research practices in the age of AI.
arXiv Detail & Related papers (2025-10-07T17:49:24Z) - On the Role of Feedback in Test-Time Scaling of Agentic AI Workflows [71.92083784393418]
Agentic AI (systems that autonomously plan and act) are becoming widespread, yet their task success rate on complex tasks remains low.<n>Inference-time alignment relies on three components: sampling, evaluation, and feedback.<n>We introduce Iterative Agent Decoding (IAD), a procedure that repeatedly inserts feedback extracted from different forms of critiques.
arXiv Detail & Related papers (2025-04-02T17:40:47Z) - DARS: Dynamic Action Re-Sampling to Enhance Coding Agent Performance by Adaptive Tree Traversal [55.13854171147104]
Large Language Models (LLMs) have revolutionized various domains, including natural language processing, data analysis, and software development.<n>We present Dynamic Action Re-Sampling (DARS), a novel inference time compute scaling approach for coding agents.<n>We evaluate our approach on SWE-Bench Lite benchmark, demonstrating that this scaling strategy achieves a pass@k score of 55% with Claude 3.5 Sonnet V2.
arXiv Detail & Related papers (2025-03-18T14:02:59Z) - Interactive Agents to Overcome Ambiguity in Software Engineering [61.40183840499932]
AI agents are increasingly being deployed to automate tasks, often based on ambiguous and underspecified user instructions.<n>Making unwarranted assumptions and failing to ask clarifying questions can lead to suboptimal outcomes.<n>We study the ability of LLM agents to handle ambiguous instructions in interactive code generation settings by evaluating proprietary and open-weight models on their performance.
arXiv Detail & Related papers (2025-02-18T17:12:26Z) - A Multi-AI Agent System for Autonomous Optimization of Agentic AI Solutions via Iterative Refinement and LLM-Driven Feedback Loops [3.729242965449096]
This paper introduces a framework for autonomously optimizing Agentic AI solutions across industries.<n>The framework achieves optimal performance without human input by autonomously generating and testing hypotheses.<n>Case studies show significant improvements in output quality, relevance, and actionability.
arXiv Detail & Related papers (2024-12-22T20:08:04Z) - Watch Every Step! LLM Agent Learning via Iterative Step-Level Process Refinement [50.481380478458945]
Iterative step-level Process Refinement (IPR) framework provides detailed step-by-step guidance to enhance agent training.
Our experiments on three complex agent tasks demonstrate that our framework outperforms a variety of strong baselines.
arXiv Detail & Related papers (2024-06-17T03:29:13Z) - A Comprehensive Performance Study of Large Language Models on Novel AI
Accelerators [2.88634411143577]
Large language models (LLMs) are being considered as a promising approach to address some of the challenging problems.
Specialized AI accelerator hardware systems have recently become available for accelerating AI applications.
arXiv Detail & Related papers (2023-10-06T21:55:57Z) - PerfDetectiveAI -- Performance Gap Analysis and Recommendation in
Software Applications [0.0]
PerfDetectiveAI, a conceptual framework for performance gap analysis and suggestion in software applications is introduced in this research.
Modern machine learning (ML) and artificial intelligence (AI) techniques are used in PerfDetectiveAI to monitor performance measurements and identify areas of underperformance in software applications.
arXiv Detail & Related papers (2023-06-11T02:53:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.