CoSIL: Software Issue Localization via LLM-Driven Code Repository Graph Searching
- URL: http://arxiv.org/abs/2503.22424v1
- Date: Fri, 28 Mar 2025 13:36:26 GMT
- Title: CoSIL: Software Issue Localization via LLM-Driven Code Repository Graph Searching
- Authors: Zhonghao Jiang, Xiaoxue Ren, Meng Yan, Wei Jiang, Yong Li, Zhongxin Liu,
- Abstract summary: CoSIL is a simple yet powerful issue localization method without training or indexing.<n>It achieves a Top-1 localization success rate of 43 percent and 44.6 percent on SWE bench Lite and SWE bench Verified.<n>When CoSIL is applied to guide the patch generation stage, the resolved rate further improves by 9.3 to 31.5 percent.
- Score: 9.969478638201261
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models (LLMs) have significantly advanced autonomous software engineering, leading to a growing number of software engineering agents that assist developers in automatic program repair. Issue localization forms the basis for accurate patch generation. However, because of limitations caused by the context window length of LLMs, existing issue localization methods face challenges in balancing concise yet effective contexts and adequately comprehensive search spaces. In this paper, we introduce CoSIL, an LLM driven, simple yet powerful function level issue localization method without training or indexing. CoSIL reduces the search space through module call graphs, iteratively searches the function call graph to obtain relevant contexts, and uses context pruning to control the search direction and manage contexts effectively. Importantly, the call graph is dynamically constructed by the LLM during search, eliminating the need for pre-parsing. Experiment results demonstrate that CoSIL achieves a Top-1 localization success rate of 43 percent and 44.6 percent on SWE bench Lite and SWE bench Verified, respectively, using Qwen2.5 Coder 32B, outperforming existing methods by 8.6 to 98.2 percent. When CoSIL is applied to guide the patch generation stage, the resolved rate further improves by 9.3 to 31.5 percent.
Related papers
- $\
abla$-Reasoner: LLM Reasoning via Test-Time Gradient Descent in Latent Space [71.23672814629448]
$nabla$-Reasoner is an iterative generation framework that integrates differentiable optimization over token logits into the decoding loop.<n>$nabla$-Reasoner achieves over 20% accuracy improvement on a challenging mathematical reasoning benchmark.
arXiv Detail & Related papers (2026-03-05T08:42:54Z) - S3-CoT: Self-Sampled Succinct Reasoning Enables Efficient Chain-of-Thought LLMs [48.80914119283909]
Large language models equipped with chain-of-thought (CoT) achieve strong performance and offer a window into behavior.<n>Recent evidence suggests that improvements in CoT capabilities often come with redundant reasoning processes.<n>Our study presents a self-sampling framework based on activation steering for efficient CoT learning.
arXiv Detail & Related papers (2026-02-02T11:37:36Z) - LLM-guided Hierarchical Retrieval [54.73080745446999]
LATTICE is a hierarchical retrieval framework that enables an LLM to reason over and navigate large corpora with logarithmic search complexity.<n>A central challenge in such LLM-guided search is that the model's relevance judgments are noisy, context-dependent, and unaware of the hierarchy.<n>Our framework achieves state-of-the-art zero-shot performance on the reasoning-intensive BRIGHT benchmark.
arXiv Detail & Related papers (2025-10-15T07:05:17Z) - Extracting Conceptual Knowledge to Locate Software Issues [12.746044344302623]
RepoLens is a novel approach that abstracts and leverages conceptual knowledge from code repositories.<n>It operates in two stages: an offline stage that extracts conceptual knowledge into a repository-wide knowledge base, and an online stage that retrieves issue-specific terms.<n>RepoLens consistently improves three state-of-the-art tools, achieving average gains of over 22% in Hit@k and 46% in Recall@k for file- and function-level localization.
arXiv Detail & Related papers (2025-09-25T11:53:06Z) - Enhancing LLM-based Fault Localization with a Functionality-Aware Retrieval-Augmented Generation Framework [14.287359838639608]
FaR-Loc is a framework that enhances method-level fault localization.<n> FaR-Loc consists of three key components: LLM Functionality Extraction, Semantic Retrieval, and LLM Re-ranking.<n>Our experiments on the widely used Defects4J benchmark show that FaR-Loc outperforms state-of-the-art LLM-based baselines.
arXiv Detail & Related papers (2025-09-24T20:37:11Z) - Teaching LLM to Reason: Reinforcement Learning from Algorithmic Problems without Code [76.80306464249217]
We propose TeaR, which aims at teaching LLMs to reason better.<n>TeaR leverages careful data curation and reinforcement learning to guide models in discovering optimal reasoning paths through code-related tasks.<n>We conduct extensive experiments using two base models and three long-CoT distillation models, with model sizes ranging from 1.5 billion to 32 billion parameters, and across 17 benchmarks spanning Math, Knowledge, Code, and Logical Reasoning.
arXiv Detail & Related papers (2025-07-10T07:34:05Z) - Divide by Question, Conquer by Agent: SPLIT-RAG with Question-Driven Graph Partitioning [62.640169289390535]
SPLIT-RAG is a multi-agent RAG framework that addresses the limitations with question-driven semantic graph partitioning and collaborative subgraph retrieval.<n>The innovative framework first create Semantic Partitioning of Linked Information, then use the Type-Specialized knowledge base to achieve Multi-Agent RAG.<n>The attribute-aware graph segmentation manages to divide knowledge graphs into semantically coherent subgraphs, ensuring subgraphs align with different query types.<n>A hierarchical merging module resolves inconsistencies across subgraph-derived answers through logical verifications.
arXiv Detail & Related papers (2025-05-20T06:44:34Z) - SweRank: Software Issue Localization with Code Ranking [109.3289316191729]
SweRank is an efficient retrieve-and-rerank framework for software issue localization.<n>We construct SweLoc, a large-scale dataset curated from public GitHub repositories.<n>We show that SweRank achieves state-of-the-art performance, outperforming both prior ranking models and costly agent-based systems.
arXiv Detail & Related papers (2025-05-07T19:44:09Z) - ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates [51.633266497799745]
hierarchical LLM reasoning via scaling thought templates can effectively optimize the reasoning search space.<n>We introduce three innovations: (i) a structured and generic thought template library, containing around 500 high-level thought templates capable of generalizing to similar or relevant reasoning problems; (ii) performing hierarchical reinforcement learning on a sequence of thought templates instead of long CoTs; and (iii) a brand new inference scaling system.
arXiv Detail & Related papers (2025-02-10T18:51:47Z) - OrcaLoca: An LLM Agent Framework for Software Issue Localization [7.608131635030634]
OrcaLoca is an agent framework that improves accuracy for software issue localization.<n>It integrates priority-based scheduling for LLM-guided action, action decomposition with relevance scoring, and distance-aware context pruning.<n>It also improves the final resolved rate of an open-source framework by 6.33 percentage points through its patch generation integration.
arXiv Detail & Related papers (2025-02-01T07:15:03Z) - LLM Program Optimization via Retrieval Augmented Search [71.40092732256252]
We propose a blackbox adaptation method called Retrieval Augmented Search (RAS) that performs beam search over candidate optimizations.<n>We show that RAS performs 1.8$times$ better than prior state-of-the-art blackbox adaptation strategies.<n>We also propose a method called AEGIS for improving interpretability by decomposing training examples into "atomic edits"
arXiv Detail & Related papers (2025-01-31T06:34:47Z) - LLM-AutoDiff: Auto-Differentiate Any LLM Workflow [58.56731133392544]
We introduce LLM-AutoDiff: a novel framework for Automatic Prompt Engineering (APE)
LLMs-AutoDiff treats each textual input as a trainable parameter and uses a frozen backward engine to generate feedback-akin to textual gradients.
It consistently outperforms existing textual gradient baselines in both accuracy and training cost.
arXiv Detail & Related papers (2025-01-28T03:18:48Z) - Harnessing Large Language Models for Knowledge Graph Question Answering via Adaptive Multi-Aspect Retrieval-Augmentation [81.18701211912779]
We introduce an Adaptive Multi-Aspect Retrieval-augmented over KGs (Amar) framework.<n>This method retrieves knowledge including entities, relations, and subgraphs, and converts each piece of retrieved text into prompt embeddings.<n>Our method has achieved state-of-the-art performance on two common datasets.
arXiv Detail & Related papers (2024-12-24T16:38:04Z) - Invar-RAG: Invariant LLM-aligned Retrieval for Better Generation [43.630437906898635]
We propose a novel two-stage fine-tuning architecture called Invar-RAG.
In the retrieval stage, an LLM-based retriever is constructed by integrating LoRA-based representation learning.
In the generation stage, a refined fine-tuning method is employed to improve LLM accuracy in generating answers based on retrieved information.
arXiv Detail & Related papers (2024-11-11T14:25:37Z) - A Multi-Agent Approach to Fault Localization via Graph-Based Retrieval and Reflexion [8.22737389683156]
Traditional fault localization techniques require extensive training datasets and high computational resources.<n>Recent advances in Large Language Models (LLMs) offer new opportunities by enhancing code understanding and reasoning.<n>We propose LLM4FL, a multi-agent fault localization framework that utilizes three specialized LLM agents.<n> evaluated on the Defects4J benchmark, which includes 675 faults from 14 Java projects, LLM4FL achieves an 18.55% improvement in Top-1 accuracy over AutoFL and 4.82% over SoapFL.
arXiv Detail & Related papers (2024-09-20T16:47:34Z) - Program Slicing in the Era of Large Language Models [7.990456190723922]
Program slicing is a critical technique in software engineering, enabling developers to isolate relevant portions of code.
This study investigates the application of large language models (LLMs) to both static and dynamic program slicing.
arXiv Detail & Related papers (2024-09-19T00:07:56Z) - SuperCoder2.0: Technical Report on Exploring the feasibility of LLMs as Autonomous Programmer [0.0]
SuperCoder2.0 is an advanced autonomous system designed to enhance software development through artificial intelligence.
System combines an AI-native development approach with intelligent agents to enable fully autonomous coding.
arXiv Detail & Related papers (2024-09-17T13:44:42Z) - OptiBench Meets ReSocratic: Measure and Improve LLMs for Optimization Modeling [62.19438812624467]
Large language models (LLMs) have exhibited their problem-solving abilities in mathematical reasoning.
We propose OptiBench, a benchmark for End-to-end optimization problem-solving with human-readable inputs and outputs.
arXiv Detail & Related papers (2024-07-13T13:27:57Z) - Navigating the Labyrinth: Evaluating and Enhancing LLMs' Ability to Reason About Search Problems [59.72548591120689]
We introduce a new benchmark, SearchBench, containing 11 unique search problem types.
We show that even the most advanced LLMs fail to solve these problems end-to-end in text.
Instructing LLMs to generate code that solves the problem helps, but only slightly, e.g., GPT4's performance rises to 11.7%.
arXiv Detail & Related papers (2024-06-18T00:44:58Z) - Hint-before-Solving Prompting: Guiding LLMs to Effectively Utilize
Encoded Knowledge [85.17343729885003]
We introduce Hint-before-Solving Prompting (HSP), which guides the model to generate hints for solving the problem.
HSP can effectively improve the accuracy of reasoning tasks.
We build the HSPMATH dataset based on HSP and fine-tuned Llemma-7B, reaching 64.3 accuracy.
arXiv Detail & Related papers (2024-02-22T05:58:03Z) - Autonomous Tree-search Ability of Large Language Models [58.68735916408101]
Large Language Models have excelled in remarkable reasoning capabilities with advanced prompting techniques.
Recent works propose to utilize external programs to define search logic, such that LLMs can perform passive tree search to solve more challenging reasoning tasks.
We propose a new concept called autonomous tree-search ability of LLM, which can automatically generate a response containing search trajectories for the correct answer.
arXiv Detail & Related papers (2023-10-14T14:14:38Z) - Large Language Models for Test-Free Fault Localization [11.080712737595174]
We propose a language model based fault localization approach that locates buggy lines of code without any test coverage information.
We fine-tune language models with 350 million, 6 billion, and 16 billion parameters on small, manually curated corpora of buggy programs.
Our empirical evaluation shows that LLMAO improves the Top-1 results over the state-of-the-art machine learning fault localization (MLFL) baselines by 2.3%-54.4%, and Top-5 results by 14.4%-35.6%.
arXiv Detail & Related papers (2023-10-03T01:26:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.