TraceSIR: A Multi-Agent Framework for Structured Analysis and Reporting of Agentic Execution Traces
- URL: http://arxiv.org/abs/2603.00623v1
- Date: Sat, 28 Feb 2026 12:33:24 GMT
- Title: TraceSIR: A Multi-Agent Framework for Structured Analysis and Reporting of Agentic Execution Traces
- Authors: Shu-Xun Yang, Cunxiang Wang, Haoke Zhang, Wenbo Yu, Lindong Wu, Jiayi Gui, Dayong Yang, Yukuo Cen, Zhuoer Feng, Bosi Wen, Yidong Wang, Lucen Zhong, Jiamin Ren, Linfeng Zhang, Jie Tang,
- Abstract summary: We propose TraceSIR, a framework for structured analysis and reporting of agentic execution traces.<n>TraceSIR coordinates three specialized agents: StructureAgent, InsightAgent, and ReportAgent.<n>Experiments show that TraceSIR consistently produces coherent, informative, and actionable reports.
- Score: 32.4073751390339
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Agentic systems augment large language models with external tools and iterative decision making, enabling complex tasks such as deep research, function calling, and coding. However, their long and intricate execution traces make failure diagnosis and root cause analysis extremely challenging. Manual inspection does not scale, while directly applying LLMs to raw traces is hindered by input length limits and unreliable reasoning. Focusing solely on final task outcomes further discards critical behavioral information required for accurate issue localization. To address these issues, we propose TraceSIR, a multi-agent framework for structured analysis and reporting of agentic execution traces. TraceSIR coordinates three specialized agents: (1) StructureAgent, which introduces a novel abstraction format, TraceFormat, to compress execution traces while preserving essential behavioral information; (2) InsightAgent, which performs fine-grained diagnosis including issue localization, root cause analysis, and optimization suggestions; (3) ReportAgent, which aggregates insights across task instances and generates comprehensive analysis reports. To evaluate TraceSIR, we construct TraceBench, covering three real-world agentic scenarios, and introduce ReportEval, an evaluation protocol for assessing the quality and usability of analysis reports aligned with industry needs. Experiments show that TraceSIR consistently produces coherent, informative, and actionable reports, significantly outperforming existing approaches across all evaluation dimensions. Our project and video are publicly available at https://github.com/SHU-XUN/TraceSIR.
Related papers
- Agentic Observability: Automated Alert Triage for Adobe E-Commerce [0.0]
This paper presents an agentic observability framework deployed within Adobe's e-commerce infrastructure.<n>The framework autonomously performs alert triage using a ReAct paradigm.<n>Our results show that agentic AI enables an order-of-magnitude reduction in triage latency and a step-change in resolution accuracy.
arXiv Detail & Related papers (2026-01-31T20:20:02Z) - The Why Behind the Action: Unveiling Internal Drivers via Agentic Attribution [63.61358761489141]
Large Language Model (LLM)-based agents are widely used in real-world applications such as customer service, web navigation, and software engineering.<n>We propose a novel framework for textbfgeneral agentic attribution, designed to identify the internal factors driving agent actions regardless of the task outcome.<n>We validate our framework across a diverse suite of agentic scenarios, including standard tool use and subtle reliability risks like memory-induced bias.
arXiv Detail & Related papers (2026-01-21T15:22:21Z) - TAAF: A Trace Abstraction and Analysis Framework Synergizing Knowledge Graphs and LLMs [3.2839783281320085]
This paper introduces TAAF (Trace Abstraction and Analysis Framework), a novel approach to transform raw trace data into actionable insights.<n>An LLM interprets query-specific subgraphs to answer natural-language questions, reducing the need for manual inspection.<n>Experiments show that TAAF improves answer accuracy by up to 31.2%, particularly in multi-hop and causal reasoning tasks.
arXiv Detail & Related papers (2026-01-06T01:04:05Z) - PRInTS: Reward Modeling for Long-Horizon Information Seeking [74.14496236655911]
We introduce PRInTS, a generative PRM trained with dual capabilities.<n>We show that PRInTS enhances information-seeking abilities of open-source models as well as specialized agents.
arXiv Detail & Related papers (2025-11-24T17:09:43Z) - Understanding Code Agent Behaviour: An Empirical Study of Success and Failure Trajectories [10.751728274263536]
This paper presents an empirical study of agent trajectories, namely the execution traces capturing the steps agents take when attempting to resolve software issues.<n>We analyse trajectories from three state-of-the-art code agents (OpenHands, SWE-agent, and Prometheus) on the SWE-Bench benchmark, examining both successful and failed attempts.
arXiv Detail & Related papers (2025-10-31T18:58:13Z) - AgentRouter: A Knowledge-Graph-Guided LLM Router for Collaborative Multi-Agent Question Answering [51.07491603393163]
tAgent is a framework that formulates multi-agent QA as a knowledge-graph-guided routing problem supervised by empirical performance signals.<n>By leveraging soft supervision and weighted aggregation of agent outputs, Agent learns principled collaboration schemes that capture the complementary strengths of diverse agents.
arXiv Detail & Related papers (2025-10-06T23:20:49Z) - MCP-Orchestrated Multi-Agent System for Automated Disinformation Detection [84.75972919995398]
This paper presents a multi-agent system that uses relation extraction to detect disinformation in news articles.<n>The proposed Agentic AI system combines four agents: (i) a machine learning agent (logistic regression), (ii) a Wikipedia knowledge check agent, and (iv) a web-scraped data analyzer.<n>Results demonstrate that the multi-agent ensemble achieves 95.3% accuracy with an F1 score of 0.964, significantly outperforming individual agents and traditional approaches.
arXiv Detail & Related papers (2025-08-13T19:14:48Z) - AgentArmor: Enforcing Program Analysis on Agent Runtime Trace to Defend Against Prompt Injection [14.522205401511727]
Large Language Model (LLM) agents offer a powerful new paradigm for solving various problems by combining natural language reasoning with the execution of external tools.<n>In this work, we propose a novel insight that treats the agent runtime traces as structured programs with analyzable semantics.<n>We present AgentArmor, a program analysis framework that converts agent traces into graph intermediate representation-based structured program dependency representations.
arXiv Detail & Related papers (2025-08-02T07:59:34Z) - Deep Research Agents: A Systematic Examination And Roadmap [109.53237992384872]
Deep Research (DR) agents are designed to tackle complex, multi-turn informational research tasks.<n>In this paper, we conduct a detailed analysis of the foundational technologies and architectural components that constitute DR agents.
arXiv Detail & Related papers (2025-06-22T16:52:48Z) - ThinkGeo: Evaluating Tool-Augmented Agents for Remote Sensing Tasks [64.86209459039313]
ThinkGeo is an agentic benchmark designed to evaluate tool-augmented agents on remote sensing tasks via structured tool use and multi-step planning.<n>We implement a ReAct-style interaction loop and evaluate both open and closed-source LLMs on 486 structured agentic tasks with 1,773 expert-verified reasoning steps.<n>Our analysis reveals notable disparities in tool accuracy and planning consistency across models.
arXiv Detail & Related papers (2025-05-29T17:59:38Z) - TRAIL: Trace Reasoning and Agentic Issue Localization [5.025960714013197]
This work articulates the need for robust and dynamic evaluation methods for agentic workflow traces.<n>We present a set of 148 large human-annotated traces (TRAIL) constructed using this taxonomy and grounded in established agentic benchmarks.<n>To ensure ecological validity, we curate traces from both single and multi-agent systems.
arXiv Detail & Related papers (2025-05-13T14:55:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.