Beyond Blame: Rethinking SZZ with Knowledge Graph Search
- URL: http://arxiv.org/abs/2602.02934v1
- Date: Tue, 03 Feb 2026 00:10:48 GMT
- Title: Beyond Blame: Rethinking SZZ with Knowledge Graph Search
- Authors: Yu Shi, Hao Li, Bram Adams, Ahmed E. Hassan,
- Abstract summary: We present AgenticSZZ, the first approach to apply Temporal Knowledge Graphs (TKGs) to software evolution analysis.<n>We show that AgenticSZZ achieves F1-scores of 0.48 to 0.74, with statistically significant improvements over state-of-the-art by up to 27%.
- Score: 13.82629698836299
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Identifying Bug-Inducing Commits (BICs) is fundamental for understanding software defects and enabling downstream tasks such as defect prediction and automated program repair. Yet existing SZZ-based approaches are limited by their reliance on git blame, which restricts the search space to commits that directly modified the fixed lines. Our preliminary study on 2,102 validated bug-fixing commits reveals that this limitation is significant: over 40% of cases cannot be solved by blame alone, as 28% of BICs require traversing commit history beyond blame results and 14% are blameless. We present AgenticSZZ, the first approach to apply Temporal Knowledge Graphs (TKGs) to software evolution analysis. AgenticSZZ reframes BIC identification from a ranking problem over blame commits into a graph search problem, where temporal ordering is fundamental to causal reasoning about bug introduction. The approach operates in two phases: (1) constructing a TKG that encodes commits with temporal and structural relationships, expanding the search space by traversing file history backward from two reference points (blame commits and the BFC); and (2) leveraging an LLM agent to navigate the graph using specialized tools for candidate exploration and causal analysis. Evaluation on three datasets shows that AgenticSZZ achieves F1-scores of 0.48 to 0.74, with statistically significant improvements over state-of-the-art by up to 27%. Our ablation study confirms that both components are essential, reflecting a classic exploration-exploitation trade-off: the TKG expands the search space while the agent provides intelligent selection. By transforming BIC identification into a graph search problem, we open a new research direction for temporal and causal reasoning in software evolution analysis.
Related papers
- HELP: HyperNode Expansion and Logical Path-Guided Evidence Localization for Accurate and Efficient GraphRAG [53.30561659838455]
Large Language Models (LLMs) often struggle with inherent knowledge boundaries and hallucinations.<n>Retrieval-Augmented Generation (RAG) frequently overlooks structural interdependencies essential for multi-hop reasoning.<n>Help achieves competitive performance across multiple simple and multi-hop QA benchmarks and up to a 28.8$times$ speedup over leading Graph-based RAG baselines.
arXiv Detail & Related papers (2026-02-24T14:05:29Z) - CodeCompass: Navigating the Navigation Paradox in Agentic Code Intelligence [0.0]
We identify the Navigation Paradox: agents perform poorly because navigation and retrieval are fundamentally distinct problems.<n>We demonstrate that graph-based structural navigation via Code--a Model Context Protocol server exposing dependency graphs--achieves 99.4% task completion on hidden-dependency tasks.
arXiv Detail & Related papers (2026-02-23T16:58:37Z) - AgenticRAGTracer: A Hop-Aware Benchmark for Diagnosing Multi-Step Retrieval Reasoning in Agentic RAG [7.139631028105273]
We introduce AgenticRAGTracer, a benchmark for agent-based multi-hop reasoning.<n>It is primarily constructed by large language models and designed to support step-by-step validation.<n>Our benchmark spans multiple domains, contains 1,305 data points, and has no overlap with existing mainstream benchmarks.
arXiv Detail & Related papers (2026-02-22T10:55:21Z) - To Search or Not to Search: Aligning the Decision Boundary of Deep Search Agents via Causal Intervention [61.82680155643223]
We identify the root cause of misaligned decision boundaries, the threshold determining when accumulated information suffices to answer.<n>This causes over-search (redundant searching despite sufficient knowledge) and under-search (premature termination yielding incorrect answers.<n>We propose a comprehensive framework comprising two key components. First, we introduce causal intervention-based diagnosis that identifies boundary errors.<n>Second, we develop Decision Boundary Alignment for Deep Search agents (DAS)<n>Our DAS method effectively calibrates these boundaries, mitigating both over-search and under-search to achieve substantial gains in accuracy and efficiency.
arXiv Detail & Related papers (2026-02-03T09:29:06Z) - TeaRAG: A Token-Efficient Agentic Retrieval-Augmented Generation Framework [62.66056331998838]
TeaRAG is a token-efficient agentic RAG framework capable of compressing both retrieval content and reasoning steps.<n>Our reward function evaluates the knowledge sufficiency by a knowledge matching mechanism, while penalizing excessive reasoning steps.
arXiv Detail & Related papers (2025-11-07T16:08:34Z) - LLMBisect: Breaking Barriers in Bug Bisection with A Comparative Analysis Pipeline [35.18683484280968]
Large Language Models (LLMs) are well-positioned to break the barriers of existing solutions.<n>LLMs comprehend both textual data and code in patches and commits.<n>Our approach achieves significantly better accuracy than the state-of-the-art solution by more than 38%.
arXiv Detail & Related papers (2025-10-30T02:47:25Z) - Eigen-1: Adaptive Multi-Agent Refinement with Monitor-Based RAG for Scientific Reasoning [53.45095336430027]
We develop a unified framework that combines implicit retrieval and structured collaboration.<n>On Humanity's Last Exam (HLE) Bio/Chem Gold, our framework achieves 48.3% accuracy.<n>Results on SuperGPQA and TRQA confirm robustness across domains.
arXiv Detail & Related papers (2025-09-25T14:05:55Z) - Enrich-on-Graph: Query-Graph Alignment for Complex Reasoning with LLM Enriching [61.824094419641575]
Large Language Models (LLMs) struggle with hallucinations and factual errors in knowledge-intensive scenarios like knowledge graph question answering (KGQA)<n>We attribute this to the semantic gap between structured knowledge graphs (KGs) and unstructured queries, caused by inherent differences in their focuses and structures.<n>Existing methods usually employ resource-intensive, non-scalable reasoning on vanilla KGs, but overlook this gap.<n>We propose a flexible framework, Enrich-on-Graph (EoG), which leverages LLMs' prior knowledge to enrich KGs, bridge the semantic gap between graphs and queries.
arXiv Detail & Related papers (2025-09-25T06:48:52Z) - An Empirical Study on Failures in Automated Issue Solving [12.571536148821144]
We analyze the performance and efficiency of three SOTA tools, spanning both pipeline-based and agentic architectures, in automated issue solving tasks of SWE-Bench-Verified.<n>To move from high-level performance metrics to underlying cause analysis, we conducted a systematic manual analysis of 150 failed instances.<n>The results reveal distinct failure fingerprints between the two architectural paradigms, with the majority of agentic failures stemming from flawed reasoning and cognitive deadlocks.
arXiv Detail & Related papers (2025-09-17T13:07:52Z) - Agent KB: Leveraging Cross-Domain Experience for Agentic Problem Solving [62.71545696485824]
We introduce AGENT KB, a universal memory infrastructure enabling seamless experience sharing across heterogeneous agent frameworks without retraining.<n>AGENT KB aggregates trajectories into a structured knowledge base and serves lightweight APIs.<n>We validate AGENT across major frameworks on GAIA, Humanity's Last Exam, GPQA, and SWE-bench.
arXiv Detail & Related papers (2025-07-08T17:59:22Z) - Align-GRAG: Reasoning-Guided Dual Alignment for Graph Retrieval-Augmented Generation [79.75818239774952]
Large language models (LLMs) have demonstrated remarkable capabilities, but still struggle with issues like hallucinations and outdated information.<n>Retrieval-augmented generation (RAG) addresses these issues by grounding LLM outputs in external knowledge with an Information Retrieval (IR) system.<n>We propose Align-GRAG, a novel reasoning-guided dual alignment framework in post-retrieval phrase.
arXiv Detail & Related papers (2025-05-22T05:15:27Z) - Divide by Question, Conquer by Agent: SPLIT-RAG with Question-Driven Graph Partitioning [62.640169289390535]
SPLIT-RAG is a multi-agent RAG framework that addresses the limitations with question-driven semantic graph partitioning and collaborative subgraph retrieval.<n>The innovative framework first create Semantic Partitioning of Linked Information, then use the Type-Specialized knowledge base to achieve Multi-Agent RAG.<n>The attribute-aware graph segmentation manages to divide knowledge graphs into semantically coherent subgraphs, ensuring subgraphs align with different query types.<n>A hierarchical merging module resolves inconsistencies across subgraph-derived answers through logical verifications.
arXiv Detail & Related papers (2025-05-20T06:44:34Z) - Identifying Root Cause of bugs by Capturing Changed Code Lines with Relational Graph Neural Networks [7.676213873923721]
We propose a method called RC-Detection to detect root-cause deletion lines in changed code lines.<n>RC-Detection is used to detect root-cause deletion lines in changed code lines, thereby identifying the root cause of introduced bugs in bug-fixing commits.<n>Our experiments show that, compared to the most advanced root cause detection methods, RC-Detection improved Recall@1, Recall@2, Recall@3, and MFR by at 4.107%, 5.113%, 4.289%, and 24.536%, respectively.
arXiv Detail & Related papers (2025-05-02T04:29:09Z) - LLM4SZZ: Enhancing SZZ Algorithm with Context-Enhanced Assessment on Large Language Models [10.525352489242398]
The SZZ algorithm is the dominant technique for identifying bug-inducing commits.<n>It serves as a foundation for many software engineering studies, such as bug prediction and static code analysis.<n>Recently, a deep learning-based SZZ algorithm has been introduced to enhance the original SZZ algorithm.
arXiv Detail & Related papers (2025-04-02T06:40:57Z) - Enhancing repository-level software repair via repository-aware knowledge graphs [13.747293341707563]
Repository-level software repair faces challenges in bridging semantic gaps between issue descriptions and code patches.<n>Existing approaches, which rely on large language models (LLMs), are hindered by semantic ambiguities, limited understanding of structural context, and insufficient reasoning capabilities.<n>We propose a novel repository-aware knowledge graph (KG) that accurately links repository artifacts (issues and pull requests) and entities (files, classes, and functions)<n>A path-guided repair mechanism that leverages KG-mined paths, tracing through which allows us to augment contextual information along with explanations.
arXiv Detail & Related papers (2025-03-27T17:21:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.