Lingxi: Repository-Level Issue Resolution Framework Enhanced by Procedural Knowledge Guided Scaling
- URL: http://arxiv.org/abs/2510.11838v1
- Date: Mon, 13 Oct 2025 18:45:04 GMT
- Title: Lingxi: Repository-Level Issue Resolution Framework Enhanced by Procedural Knowledge Guided Scaling
- Authors: Xu Yang, Jiayuan Zhou, Michael Pacheco, Wenhan Zhu, Pengfei He, Shaowei Wang, Kui Liu, Ruiqi Pan,
- Abstract summary: Lingxi is an issue resolution framework that leverages procedural knowledge extracted from historical issue-fixing data.<n>Our comprehensive ablation study confirms that the success of Lingxi comes directly from its use of procedural knowledge.
- Score: 17.25732448913281
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Driven by the advancements of Large Language Models (LLMs), LLM-powered agents are making significant improvements in software engineering tasks, yet struggle with complex, repository-level issue resolution. Existing agent-based methods have two key limitations. First, they lack of procedural knowledge (i.e., how an issue is fixed step-by-step and rationales behind it) to learn and leverage for issue resolution. Second, they rely on massive computational power to blindly explore the solution space. % To address those limitations, we propose Lingxi, an issue resolution framework that leverages procedural knowledge extracted from historical issue-fixing data to guide agents in solving repository-level issues. \ourTool first constructs this knowledge offline through a hierarchical abstraction mechanism, enabling agents to learn the how and why behind a fix, not just the final solution. During online application, it employs a knowledge-driven scaling method that leverages the procedural knowledge of similar issues to intelligently analyze the target issue from multiple perspectives, in sharp contrast to undirected, brute-force exploration. % Lingxi successfully resolves 74.6\% of bugs on the SWE-bench Verified benchmark in Past@1 setting, outperforming five state-of-the-art techniques by a significant margin (5.4\% to 14.9\%). Our comprehensive ablation study confirmed that the success of Lingxi comes directly from its use of procedural knowledge. Without it, the performance gains from scaling alone is negligible. Our qualitative study further shows that the ``design patterns $\&$ coding practices'' is the most critical knowledge aspect, and that the roles of different knowledge aspects switch across different stages (i.e., analysis, planning, and fixing).
Related papers
- Understanding Code Agent Behaviour: An Empirical Study of Success and Failure Trajectories [10.751728274263536]
This paper presents an empirical study of agent trajectories, namely the execution traces capturing the steps agents take when attempting to resolve software issues.<n>We analyse trajectories from three state-of-the-art code agents (OpenHands, SWE-agent, and Prometheus) on the SWE-Bench benchmark, examining both successful and failed attempts.
arXiv Detail & Related papers (2025-10-31T18:58:13Z) - Executable Knowledge Graphs for Replicating AI Research [65.41207324831583]
Executable Knowledge Graphs (xKG) is a modular and pluggable knowledge base that automatically integrates technical insights, code snippets, and domain-specific knowledge extracted from scientific literature.<n>Code will released at https://github.com/zjunlp/xKG.
arXiv Detail & Related papers (2025-10-20T17:53:23Z) - AInstein: Assessing the Feasibility of AI-Generated Approaches to Research Problems [28.38783951577184]
AInstein is a framework for testing whether large language models can generate valid solutions to AI research problems.<n>We evaluate AInstein on 1,214 ICLR papers stratified by acceptance tier.
arXiv Detail & Related papers (2025-10-06T22:50:41Z) - SWE-Exp: Experience-Driven Software Issue Resolution [19.525080502900785]
We introduce SWE-Exp, an experience - enhanced approach that distills concise and actionable experience from prior agent trajectories.<n>Our method introduces a multi-faceted experience bank that captures both successful and failed repair attempts.<n>Experiments show that SWE-Exp achieves state-of-the-art resolution rate (41.6% Pass@1) on SWE-bench-Verified.
arXiv Detail & Related papers (2025-07-31T09:13:42Z) - Thinking Longer, Not Larger: Enhancing Software Engineering Agents via Scaling Test-Time Compute [61.00662702026523]
We propose a unified Test-Time Compute scaling framework that leverages increased inference-time instead of larger models.<n>Our framework incorporates two complementary strategies: internal TTC and external TTC.<n>We demonstrate our textbf32B model achieves a 46% issue resolution rate, surpassing significantly larger models such as DeepSeek R1 671B and OpenAI o1.
arXiv Detail & Related papers (2025-03-31T07:31:32Z) - Analytic Subspace Routing: How Recursive Least Squares Works in Continual Learning of Large Language Model [6.42114585934114]
Large Language Models (LLMs) possess capabilities that can process diverse language-related tasks.<n>Continual Learning in Large Language Models (LLMs) arises which aims to continually adapt the LLMs to new tasks.<n>This paper proposes Analytic Subspace Routing(ASR) to address these challenges.
arXiv Detail & Related papers (2025-03-17T13:40:46Z) - Disentangling Memory and Reasoning Ability in Large Language Models [97.26827060106581]
We propose a new inference paradigm that decomposes the complex inference process into two distinct and clear actions.<n>Our experiment results show that this decomposition improves model performance and enhances the interpretability of the inference process.
arXiv Detail & Related papers (2024-11-20T17:55:38Z) - Reasoning Paths Optimization: Learning to Reason and Explore From Diverse Paths [69.39559168050923]
We introduce Reasoning Paths Optimization (RPO), which enables learning to reason and explore from diverse paths.
Our approach encourages favorable branches at each reasoning step while penalizing unfavorable ones, enhancing the model's overall problem-solving performance.
We focus on multi-step reasoning tasks, such as math word problems and science-based exam questions.
arXiv Detail & Related papers (2024-10-07T06:37:25Z) - A Multi-Agent Approach to Fault Localization via Graph-Based Retrieval and Reflexion [8.22737389683156]
Traditional fault localization techniques require extensive training datasets and high computational resources.<n>Recent advances in Large Language Models (LLMs) offer new opportunities by enhancing code understanding and reasoning.<n>We propose LLM4FL, a multi-agent fault localization framework that utilizes three specialized LLM agents.<n> evaluated on the Defects4J benchmark, which includes 675 faults from 14 Java projects, LLM4FL achieves an 18.55% improvement in Top-1 accuracy over AutoFL and 4.82% over SoapFL.
arXiv Detail & Related papers (2024-09-20T16:47:34Z) - Know the Unknown: An Uncertainty-Sensitive Method for LLM Instruction Tuning [18.283963879468466]
Large language models (LLMs) demonstrate remarkable capabilities but face challenges from hallucinations.<n>We introduce Uncertainty-and-Sensitivity-Aware Tuning (US-Tuning), a novel two-stage approach for contextual question answering.<n>Our experimental results demonstrate that US-Tuning not only significantly reduces incorrect answers in contextual QA but also improves models' faithfulness to their parametric knowledge.
arXiv Detail & Related papers (2024-06-14T14:56:04Z) - A Closer Look at the Limitations of Instruction Tuning [52.587607091917214]
We show that Instruction Tuning (IT) fails to enhance knowledge or skills in large language models (LLMs)
We also show that popular methods to improve IT do not lead to performance improvements over a simple LoRA fine-tuned model.
Our findings reveal that responses generated solely from pre-trained knowledge consistently outperform responses by models that learn any form of new knowledge from IT on open-source datasets.
arXiv Detail & Related papers (2024-02-03T04:45:25Z) - Leveraging Expert Consistency to Improve Algorithmic Decision Support [62.61153549123407]
We explore the use of historical expert decisions as a rich source of information that can be combined with observed outcomes to narrow the construct gap.
We propose an influence function-based methodology to estimate expert consistency indirectly when each case in the data is assessed by a single expert.
Our empirical evaluation, using simulations in a clinical setting and real-world data from the child welfare domain, indicates that the proposed approach successfully narrows the construct gap.
arXiv Detail & Related papers (2021-01-24T05:40:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.