Related papers: Process-Centric Analysis of Agentic Software Systems

Process-Centric Analysis of Agentic Software Systems

URL: http://arxiv.org/abs/2512.02393v1
Date: Tue, 02 Dec 2025 04:12:29 GMT
Title: Process-Centric Analysis of Agentic Software Systems
Authors: Shuyang Liu, Yang Chen, Rahul Krishna, Saurabh Sinha, Jatin Ganhotra, Reyhan Jabbarvand,
Abstract summary: We introduce Graphectory to encode the temporal and semantic relations in software systems.<n>We analyze 4000 trajectories of two dominant agentic programming models.<n>Our fully automated analyses reveal that agents using richer prompts exhibit more complex Graphectory.
Score: 10.976178600911263
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Agentic systems are modern software systems: they consist of orchestrated modules, expose interfaces, and are deployed in software pipelines. Unlike conventional programs, their execution (i.e., trajectories) is inherently stochastic and adaptive to the problem they are solving. Evaluation of such systems is often outcome-centric, judging their performance based on success or failure at the final step. This narrow focus overlooks detailed insights about such systems, failing to explain how agents reason, plan, act, or change their strategies over time. Inspired by the structured representation of conventional software systems as graphs, we introduce Graphectory to systematically encode the temporal and semantic relations in such software systems. Graphectory facilitates the design of process-centric metrics and analyses to assess the quality of agentic workflows independent of final success. Using Graphectory, we analyze 4000 trajectories of two dominant agentic programming workflows, namely SWE-agent and OpenHands, with a combination of four backbone Large Language Models (LLMs), attempting to resolve SWE-bench Verified issues. Our fully automated analyses reveal that: (1) agents using richer prompts or stronger LLMs exhibit more complex Graphectory, reflecting deeper exploration, broader context gathering, and more thorough validation before patch submission; (2) agents' problem-solving strategies vary with both problem difficulty and the underlying LLM -- for resolved issues, the strategies often follow coherent localization-patching-validation steps, while unresolved ones exhibit chaotic, repetitive, or backtracking behaviors; (3) even when successful, agentic programming systems often display inefficient processes, leading to unnecessarily prolonged trajectories.

Related papers

Understanding Code Agent Behaviour: An Empirical Study of Success and Failure Trajectories [10.751728274263536]
This paper presents an empirical study of agent trajectories, namely the execution traces capturing the steps agents take when attempting to resolve software issues.<n>We analyse trajectories from three state-of-the-art code agents (OpenHands, SWE-agent, and Prometheus) on the SWE-Bench benchmark, examining both successful and failed attempts.
arXiv Detail & Related papers (2025-10-31T18:58:13Z)
A Comprehensive Survey on Benchmarks and Solutions in Software Engineering of LLM-Empowered Agentic System [56.40989626804489]
This survey provides the first holistic analysis of Large Language Models-powered software engineering.<n>We review over 150 recent papers and propose a taxonomy along two key dimensions: (1) Solutions, categorized into prompt-based, fine-tuning-based, and agent-based paradigms, and (2) Benchmarks, including tasks such as code generation, translation, and repair.
arXiv Detail & Related papers (2025-10-10T06:56:50Z)
AgentRouter: A Knowledge-Graph-Guided LLM Router for Collaborative Multi-Agent Question Answering [51.07491603393163]
tAgent is a framework that formulates multi-agent QA as a knowledge-graph-guided routing problem supervised by empirical performance signals.<n>By leveraging soft supervision and weighted aggregation of agent outputs, Agent learns principled collaboration schemes that capture the complementary strengths of diverse agents.
arXiv Detail & Related papers (2025-10-06T23:20:49Z)
Just-in-time Episodic Feedback Hinter: Leveraging Offline Knowledge to Improve LLM Agents Adaptation [77.90555621662345]
We present JEF Hinter, an agentic system that distills offline traces into compact, context-aware hints.<n>A zooming mechanism highlights decisive steps in long trajectories, capturing both strategies and pitfalls.<n>Experiments on MiniWoB++, WorkArena-L1, and WebArena-Lite show that JEF Hinter consistently outperforms strong baselines.
arXiv Detail & Related papers (2025-10-05T21:34:42Z)
CoDA: Agentic Systems for Collaborative Data Visualization [57.270599188947294]
Deep research has revolutionized data analysis, yet data scientists still devote substantial time to manually crafting visualizations.<n>Existing approaches, including simple single- or multi-agent systems, often oversimplify the task.<n>We introduce CoDA, a multi-agent system that employs specialized LLM agents for metadata analysis, task planning, code generation, and self-reflection.
arXiv Detail & Related papers (2025-10-03T17:30:16Z)
Illuminating LLM Coding Agents: Visual Analytics for Deeper Understanding and Enhancement [16.472150248814767]
We introduce a visual analytics system designed to enhance the examination of coding agent behaviors.<n>Our system enables ML scientists to gain a structured understanding of agent behaviors.
arXiv Detail & Related papers (2025-08-18T01:17:11Z)
TRAIL: Trace Reasoning and Agentic Issue Localization [5.025960714013197]
This work articulates the need for robust and dynamic evaluation methods for agentic workflow traces.<n>We present a set of 148 large human-annotated traces (TRAIL) constructed using this taxonomy and grounded in established agentic benchmarks.<n>To ensure ecological validity, we curate traces from both single and multi-agent systems.
arXiv Detail & Related papers (2025-05-13T14:55:31Z)
Thinking Longer, Not Larger: Enhancing Software Engineering Agents via Scaling Test-Time Compute [61.00662702026523]
We propose a unified Test-Time Compute scaling framework that leverages increased inference-time instead of larger models.<n>Our framework incorporates two complementary strategies: internal TTC and external TTC.<n>We demonstrate our textbf32B model achieves a 46% issue resolution rate, surpassing significantly larger models such as DeepSeek R1 671B and OpenAI o1.
arXiv Detail & Related papers (2025-03-31T07:31:32Z)
Why Do Multi-Agent LLM Systems Fail? [87.90075668488434]
We introduce MAST-Data, a comprehensive dataset of 1600+ annotated traces collected across 7 popular MAS frameworks.<n>We build the first Multi-Agent System Failure taxonomy (MAST)<n>We leverage MAST and MAST-Data to analyze failure patterns across models (GPT4, Claude 3, Qwen2.5, CodeLlama) and tasks (coding, math, general agent)
arXiv Detail & Related papers (2025-03-17T19:04:38Z)
Analyzing Logs of Large-Scale Software Systems using Time Curves Visualization [0.0]
We show that our approach can explain the main events in logs collected from different applications without prior knowledge.<n>As a result, we expect a significant reduction of the time required to identify performance bottlenecks and security risks.
arXiv Detail & Related papers (2024-11-08T12:42:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.