CoSIL: Issue Localization via LLM-Driven Code Graph Searching
- URL: http://arxiv.org/abs/2503.22424v2
- Date: Mon, 22 Sep 2025 12:45:26 GMT
- Title: CoSIL: Issue Localization via LLM-Driven Code Graph Searching
- Authors: Zhonghao Jiang, Xiaoxue Ren, Meng Yan, Wei Jiang, Yong Li, Zhongxin Liu,
- Abstract summary: This paper introduces CoSIL, an LLM-driven, powerful function-level issue localization method without training or indexing.<n>Experiment results demonstrate that CoSIL achieves a Top-1 localization accuracy of 43.3% and 44.6% on SWE-bench Lite and SWE-bench Verified.<n>When CoSIL is integrated into an issue-solving method, Agentless, the issue resolution rate improves by 2.98%--30.5%.
- Score: 10.396193740828032
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Issue solving aims to generate patches to fix reported issues in real-world code repositories according to issue descriptions. Issue localization forms the basis for accurate issue solving. Recently, LLM-based issue localization methods have demonstrated state-of-the-art performance. However, these methods either search from files mentioned in issue descriptions or in the whole repository and struggle to balance the breadth and depth of the search space to converge on the target efficiently. Moreover, they allow LLM to explore whole repositories freely, making it challenging to control the search direction to prevent the LLM from searching for incorrect targets. This paper introduces CoSIL, an LLM-driven, powerful function-level issue localization method without training or indexing. CoSIL employs a two-phase code graph search strategy. It first conducts broad exploration at the file level using dynamically constructed module call graphs, and then performs in-depth analysis at the function level by expanding the module call graph into a function call graph and executing iterative searches. To precisely control the search direction, CoSIL designs a pruner to filter unrelated directions and irrelevant contexts. To avoid incorrect interaction formats in long contexts, CoSIL introduces a reflection mechanism that uses additional independent queries in short contexts to enhance formatted abilities. Experiment results demonstrate that CoSIL achieves a Top-1 localization accuracy of 43.3\% and 44.6\% on SWE-bench Lite and SWE-bench Verified, respectively, with Qwen2.5-Coder-32B, average outperforming the state-of-the-art methods by 96.04\%. When CoSIL is integrated into an issue-solving method, Agentless, the issue resolution rate improves by 2.98\%--30.5\%.
Related papers
- $\
abla$-Reasoner: LLM Reasoning via Test-Time Gradient Descent in Latent Space [71.23672814629448]
$nabla$-Reasoner is an iterative generation framework that integrates differentiable optimization over token logits into the decoding loop.<n>$nabla$-Reasoner achieves over 20% accuracy improvement on a challenging mathematical reasoning benchmark.
arXiv Detail & Related papers (2026-03-05T08:42:54Z) - S3-CoT: Self-Sampled Succinct Reasoning Enables Efficient Chain-of-Thought LLMs [48.80914119283909]
Large language models equipped with chain-of-thought (CoT) achieve strong performance and offer a window into behavior.<n>Recent evidence suggests that improvements in CoT capabilities often come with redundant reasoning processes.<n>Our study presents a self-sampling framework based on activation steering for efficient CoT learning.
arXiv Detail & Related papers (2026-02-02T11:37:36Z) - LLM-guided Hierarchical Retrieval [54.73080745446999]
LATTICE is a hierarchical retrieval framework that enables an LLM to reason over and navigate large corpora with logarithmic search complexity.<n>A central challenge in such LLM-guided search is that the model's relevance judgments are noisy, context-dependent, and unaware of the hierarchy.<n>Our framework achieves state-of-the-art zero-shot performance on the reasoning-intensive BRIGHT benchmark.
arXiv Detail & Related papers (2025-10-15T07:05:17Z) - Extracting Conceptual Knowledge to Locate Software Issues [12.746044344302623]
RepoLens is a novel approach that abstracts and leverages conceptual knowledge from code repositories.<n>It operates in two stages: an offline stage that extracts conceptual knowledge into a repository-wide knowledge base, and an online stage that retrieves issue-specific terms.<n>RepoLens consistently improves three state-of-the-art tools, achieving average gains of over 22% in Hit@k and 46% in Recall@k for file- and function-level localization.
arXiv Detail & Related papers (2025-09-25T11:53:06Z) - Enhancing LLM-based Fault Localization with a Functionality-Aware Retrieval-Augmented Generation Framework [14.287359838639608]
FaR-Loc is a framework that enhances method-level fault localization.<n> FaR-Loc consists of three key components: LLM Functionality Extraction, Semantic Retrieval, and LLM Re-ranking.<n>Our experiments on the widely used Defects4J benchmark show that FaR-Loc outperforms state-of-the-art LLM-based baselines.
arXiv Detail & Related papers (2025-09-24T20:37:11Z) - Teaching LLM to Reason: Reinforcement Learning from Algorithmic Problems without Code [76.80306464249217]
We propose TeaR, which aims at teaching LLMs to reason better.<n>TeaR leverages careful data curation and reinforcement learning to guide models in discovering optimal reasoning paths through code-related tasks.<n>We conduct extensive experiments using two base models and three long-CoT distillation models, with model sizes ranging from 1.5 billion to 32 billion parameters, and across 17 benchmarks spanning Math, Knowledge, Code, and Logical Reasoning.
arXiv Detail & Related papers (2025-07-10T07:34:05Z) - Divide by Question, Conquer by Agent: SPLIT-RAG with Question-Driven Graph Partitioning [62.640169289390535]
SPLIT-RAG is a multi-agent RAG framework that addresses the limitations with question-driven semantic graph partitioning and collaborative subgraph retrieval.<n>The innovative framework first create Semantic Partitioning of Linked Information, then use the Type-Specialized knowledge base to achieve Multi-Agent RAG.<n>The attribute-aware graph segmentation manages to divide knowledge graphs into semantically coherent subgraphs, ensuring subgraphs align with different query types.<n>A hierarchical merging module resolves inconsistencies across subgraph-derived answers through logical verifications.
arXiv Detail & Related papers (2025-05-20T06:44:34Z) - SweRank: Software Issue Localization with Code Ranking [109.3289316191729]
SweRank is an efficient retrieve-and-rerank framework for software issue localization.<n>We construct SweLoc, a large-scale dataset curated from public GitHub repositories.<n>We show that SweRank achieves state-of-the-art performance, outperforming both prior ranking models and costly agent-based systems.
arXiv Detail & Related papers (2025-05-07T19:44:09Z) - ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates [51.633266497799745]
hierarchical LLM reasoning via scaling thought templates can effectively optimize the reasoning search space.<n>We introduce three innovations: (i) a structured and generic thought template library, containing around 500 high-level thought templates capable of generalizing to similar or relevant reasoning problems; (ii) performing hierarchical reinforcement learning on a sequence of thought templates instead of long CoTs; and (iii) a brand new inference scaling system.
arXiv Detail & Related papers (2025-02-10T18:51:47Z) - OrcaLoca: An LLM Agent Framework for Software Issue Localization [7.608131635030634]
OrcaLoca is an agent framework that improves accuracy for software issue localization.<n>It integrates priority-based scheduling for LLM-guided action, action decomposition with relevance scoring, and distance-aware context pruning.<n>It also improves the final resolved rate of an open-source framework by 6.33 percentage points through its patch generation integration.
arXiv Detail & Related papers (2025-02-01T07:15:03Z) - LLM Program Optimization via Retrieval Augmented Search [71.40092732256252]
We propose a blackbox adaptation method called Retrieval Augmented Search (RAS) that performs beam search over candidate optimizations.<n>We show that RAS performs 1.8$times$ better than prior state-of-the-art blackbox adaptation strategies.<n>We also propose a method called AEGIS for improving interpretability by decomposing training examples into "atomic edits"
arXiv Detail & Related papers (2025-01-31T06:34:47Z) - LLM-AutoDiff: Auto-Differentiate Any LLM Workflow [58.56731133392544]
We introduce LLM-AutoDiff: a novel framework for Automatic Prompt Engineering (APE)
LLMs-AutoDiff treats each textual input as a trainable parameter and uses a frozen backward engine to generate feedback-akin to textual gradients.
It consistently outperforms existing textual gradient baselines in both accuracy and training cost.
arXiv Detail & Related papers (2025-01-28T03:18:48Z) - Harnessing Large Language Models for Knowledge Graph Question Answering via Adaptive Multi-Aspect Retrieval-Augmentation [81.18701211912779]
We introduce an Adaptive Multi-Aspect Retrieval-augmented over KGs (Amar) framework.<n>This method retrieves knowledge including entities, relations, and subgraphs, and converts each piece of retrieved text into prompt embeddings.<n>Our method has achieved state-of-the-art performance on two common datasets.
arXiv Detail & Related papers (2024-12-24T16:38:04Z) - Invar-RAG: Invariant LLM-aligned Retrieval for Better Generation [43.630437906898635]
We propose a novel two-stage fine-tuning architecture called Invar-RAG.
In the retrieval stage, an LLM-based retriever is constructed by integrating LoRA-based representation learning.
In the generation stage, a refined fine-tuning method is employed to improve LLM accuracy in generating answers based on retrieved information.
arXiv Detail & Related papers (2024-11-11T14:25:37Z) - A Multi-Agent Approach to Fault Localization via Graph-Based Retrieval and Reflexion [8.22737389683156]
Traditional fault localization techniques require extensive training datasets and high computational resources.<n>Recent advances in Large Language Models (LLMs) offer new opportunities by enhancing code understanding and reasoning.<n>We propose LLM4FL, a multi-agent fault localization framework that utilizes three specialized LLM agents.<n> evaluated on the Defects4J benchmark, which includes 675 faults from 14 Java projects, LLM4FL achieves an 18.55% improvement in Top-1 accuracy over AutoFL and 4.82% over SoapFL.
arXiv Detail & Related papers (2024-09-20T16:47:34Z) - Program Slicing in the Era of Large Language Models [7.990456190723922]
Program slicing is a critical technique in software engineering, enabling developers to isolate relevant portions of code.
This study investigates the application of large language models (LLMs) to both static and dynamic program slicing.
arXiv Detail & Related papers (2024-09-19T00:07:56Z) - SuperCoder2.0: Technical Report on Exploring the feasibility of LLMs as Autonomous Programmer [0.0]
SuperCoder2.0 is an advanced autonomous system designed to enhance software development through artificial intelligence.
System combines an AI-native development approach with intelligent agents to enable fully autonomous coding.
arXiv Detail & Related papers (2024-09-17T13:44:42Z) - OptiBench Meets ReSocratic: Measure and Improve LLMs for Optimization Modeling [62.19438812624467]
Large language models (LLMs) have exhibited their problem-solving abilities in mathematical reasoning.
We propose OptiBench, a benchmark for End-to-end optimization problem-solving with human-readable inputs and outputs.
arXiv Detail & Related papers (2024-07-13T13:27:57Z) - Navigating the Labyrinth: Evaluating and Enhancing LLMs' Ability to Reason About Search Problems [59.72548591120689]
We introduce a new benchmark, SearchBench, containing 11 unique search problem types.
We show that even the most advanced LLMs fail to solve these problems end-to-end in text.
Instructing LLMs to generate code that solves the problem helps, but only slightly, e.g., GPT4's performance rises to 11.7%.
arXiv Detail & Related papers (2024-06-18T00:44:58Z) - Hint-before-Solving Prompting: Guiding LLMs to Effectively Utilize
Encoded Knowledge [85.17343729885003]
We introduce Hint-before-Solving Prompting (HSP), which guides the model to generate hints for solving the problem.
HSP can effectively improve the accuracy of reasoning tasks.
We build the HSPMATH dataset based on HSP and fine-tuned Llemma-7B, reaching 64.3 accuracy.
arXiv Detail & Related papers (2024-02-22T05:58:03Z) - Autonomous Tree-search Ability of Large Language Models [58.68735916408101]
Large Language Models have excelled in remarkable reasoning capabilities with advanced prompting techniques.
Recent works propose to utilize external programs to define search logic, such that LLMs can perform passive tree search to solve more challenging reasoning tasks.
We propose a new concept called autonomous tree-search ability of LLM, which can automatically generate a response containing search trajectories for the correct answer.
arXiv Detail & Related papers (2023-10-14T14:14:38Z) - Large Language Models for Test-Free Fault Localization [11.080712737595174]
We propose a language model based fault localization approach that locates buggy lines of code without any test coverage information.
We fine-tune language models with 350 million, 6 billion, and 16 billion parameters on small, manually curated corpora of buggy programs.
Our empirical evaluation shows that LLMAO improves the Top-1 results over the state-of-the-art machine learning fault localization (MLFL) baselines by 2.3%-54.4%, and Top-5 results by 14.4%-35.6%.
arXiv Detail & Related papers (2023-10-03T01:26:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.