OrcaLoca: An LLM Agent Framework for Software Issue Localization
- URL: http://arxiv.org/abs/2502.00350v1
- Date: Sat, 01 Feb 2025 07:15:03 GMT
- Title: OrcaLoca: An LLM Agent Framework for Software Issue Localization
- Authors: Zhongming Yu, Hejia Zhang, Yujie Zhao, Hanxian Huang, Matrix Yao, Ke Ding, Jishen Zhao,
- Abstract summary: OrcaLoca is an agent framework that improves accuracy for software issue localization.<n>It integrates priority-based scheduling for LLM-guided action, action decomposition with relevance scoring, and distance-aware context pruning.<n>It also improves the final resolved rate of an open-source framework by 6.33 percentage points through its patch generation integration.
- Score: 7.608131635030634
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent developments in Large Language Model (LLM) agents are revolutionizing Autonomous Software Engineering (ASE), enabling automated coding, problem fixes, and feature improvements. However, localization -- precisely identifying software problems by navigating to relevant code sections -- remains a significant challenge. Current approaches often yield suboptimal results due to a lack of effective integration between LLM agents and precise code search mechanisms. This paper introduces OrcaLoca, an LLM agent framework that improves accuracy for software issue localization by integrating priority-based scheduling for LLM-guided action, action decomposition with relevance scoring, and distance-aware context pruning. Experimental results demonstrate that OrcaLoca becomes the new open-source state-of-the-art (SOTA) in function match rate (65.33%) on SWE-bench Lite. It also improves the final resolved rate of an open-source framework by 6.33 percentage points through its patch generation integration.
Related papers
- RGFL: Reasoning Guided Fault Localization for Automated Program Repair Using Large Language Models [1.9196411948992402]
We present a novel project-level FL approach that improves both file- and element-level localization.<n>We evaluate our approach on Python and Java projects from SWE-bench Verified, Lite, and Java.
arXiv Detail & Related papers (2026-01-25T23:41:42Z) - MeltRTL: Multi-Expert LLMs with Inference-time Intervention for RTL Code Generation [0.0]
MeltRTL is a novel framework that integrates multi-expert attention with inference-time intervention.<n>MeltRTL significantly improves the accuracy of large language models (LLMs) without retraining the base model.<n>We evaluate MeltRTL on the VerilogEval benchmark, achieving 96% synthesizability and 60% functional correctness.
arXiv Detail & Related papers (2026-01-19T12:49:39Z) - MCP-Universe: Benchmarking Large Language Models with Real-World Model Context Protocol Servers [86.00932417210477]
We introduce MCP-Universe, the first comprehensive benchmark specifically designed to evaluate LLMs in realistic and hard tasks through interaction with real-world MCP servers.<n>Our benchmark encompasses 6 core domains spanning 11 different MCP servers: Location Navigation, Repository Management, Financial Analysis, 3D Design, Browser Automation, and Web Searching.<n>We find that even SOTA models such as GPT-5 (43.72%), Grok-4 (33.33%) and Claude-4.0-Sonnet (29.44%) exhibit significant performance limitations.
arXiv Detail & Related papers (2025-08-20T13:28:58Z) - Tool-integrated Reinforcement Learning for Repo Deep Search [2.5556672917309653]
We present ToolTrain, a two-stage tool-integrated training framework combining rejection-sampled supervised fine-tuning and tool-integrated reinforcement learning.<n> Experimental results show that ToolTrain-trained models achieve state-of-the-art performance, with our 32B model even surpassing Claude-3.7 on function-level localization.
arXiv Detail & Related papers (2025-08-05T02:44:21Z) - Augmenting Large Language Models with Static Code Analysis for Automated Code Quality Improvements [0.36832029288386137]
This study examined code issue detection and revision automation by integrating Large Language Models (LLMs) into software development.<n>A static code analysis framework detects issues such as bugs, vulnerabilities, and code smells within a large-scale software project.<n>Retrieval-augmented generation (RAG) is implemented to enhance the relevance and precision of the revisions.
arXiv Detail & Related papers (2025-06-12T03:39:25Z) - SweRank: Software Issue Localization with Code Ranking [109.3289316191729]
SweRank is an efficient retrieve-and-rerank framework for software issue localization.<n>We construct SweLoc, a large-scale dataset curated from public GitHub repositories.<n>We show that SweRank achieves state-of-the-art performance, outperforming both prior ranking models and costly agent-based systems.
arXiv Detail & Related papers (2025-05-07T19:44:09Z) - Thinking Longer, Not Larger: Enhancing Software Engineering Agents via Scaling Test-Time Compute [61.00662702026523]
We propose a unified Test-Time Compute scaling framework that leverages increased inference-time instead of larger models.
Our framework incorporates two complementary strategies: internal TTC and external TTC.
We demonstrate our textbf32B model achieves a 46% issue resolution rate, surpassing significantly larger models such as DeepSeek R1 671B and OpenAI o1.
arXiv Detail & Related papers (2025-03-31T07:31:32Z) - CoSIL: Software Issue Localization via LLM-Driven Code Repository Graph Searching [9.969478638201261]
CoSIL is a simple yet powerful issue localization method without training or indexing.
It achieves a Top-1 localization success rate of 43 percent and 44.6 percent on SWE bench Lite and SWE bench Verified.
When CoSIL is applied to guide the patch generation stage, the resolved rate further improves by 9.3 to 31.5 percent.
arXiv Detail & Related papers (2025-03-28T13:36:26Z) - DARS: Dynamic Action Re-Sampling to Enhance Coding Agent Performance by Adaptive Tree Traversal [55.13854171147104]
Large Language Models (LLMs) have revolutionized various domains, including natural language processing, data analysis, and software development.
We present Dynamic Action Re-Sampling (DARS), a novel inference time compute scaling approach for coding agents.
We evaluate our approach on SWE-Bench Lite benchmark, demonstrating that this scaling strategy achieves a pass@k score of 55% with Claude 3.5 Sonnet V2.
arXiv Detail & Related papers (2025-03-18T14:02:59Z) - LLM-Lasso: A Robust Framework for Domain-Informed Feature Selection and Regularization [59.75242204923353]
We introduce LLM-Lasso, a framework that leverages large language models (LLMs) to guide feature selection in Lasso regression.
LLMs generate penalty factors for each feature, which are converted into weights for the Lasso penalty using a simple, tunable model.
Features identified as more relevant by the LLM receive lower penalties, increasing their likelihood of being retained in the final model.
arXiv Detail & Related papers (2025-02-15T02:55:22Z) - EDA-Aware RTL Generation with Large Language Models [0.7831852829409273]
Large Language Models (LLMs) have become increasingly popular for generating RTL code.<n> producing error-free RTL code in a zero-shot setting remains highly challenging for even state-of-the-art LLMs.<n>We introduce AIvril2, a self-verifying, LLM-agnostic agentic framework aimed at enhancing RTL code generation through iterative corrections of both syntax and functional errors.
arXiv Detail & Related papers (2024-11-21T00:37:51Z) - Attention Is All You Need for LLM-based Code Vulnerability Localization [30.532439965854774]
Traditional methods for vulnerability localization, such as manual code audits or rule-based tools, are often time-consuming and limited in scope.
This paper introduces LOVA, a novel framework leveraging the self-attention mechanisms inherent in large language models.
We demonstrate that LOVA significantly outperforms existing LLM-based approaches, achieving up to a 5.3x improvement in F1-scores.
arXiv Detail & Related papers (2024-10-20T05:02:18Z) - Enhancing Fault Localization Through Ordered Code Analysis with LLM Agents and Self-Reflection [8.22737389683156]
Large Language Models (LLMs) offer promising improvements in fault localization by enhancing code comprehension and reasoning.
We introduce LLM4FL, a novel LLM-agent-based fault localization approach that integrates SBFL rankings with a divide-and-conquer strategy.
Our results demonstrate that LLM4FL outperforms AutoFL by 19.27% in Top-1 accuracy and surpasses state-of-the-art supervised techniques such as DeepFL and Grace.
arXiv Detail & Related papers (2024-09-20T16:47:34Z) - Code Vulnerability Detection: A Comparative Analysis of Emerging Large Language Models [0.46085106405479537]
This paper investigates the effectiveness of Large Language Models (LLMs) in identifying vulnerabilities withins.
We assess the performance of emerging LLMs, specifically Llama, CodeLlama, Gemma, and CodeGemma, alongside established state-of-the-art models BERT, RoBERTa, and GPT-3.
We observe that CodeGemma achieves the highest F1-score of 58 and a Recall of 87, amongst the recent additions of large language models to detect software security vulnerabilities.
arXiv Detail & Related papers (2024-09-16T17:23:00Z) - Agent-Driven Automatic Software Improvement [55.2480439325792]
This research proposal aims to explore innovative solutions by focusing on the deployment of agents powered by Large Language Models (LLMs)
The iterative nature of agents, which allows for continuous learning and adaptation, can help surpass common challenges in code generation.
We aim to use the iterative feedback in these systems to further fine-tune the LLMs underlying the agents, becoming better aligned to the task of automated software improvement.
arXiv Detail & Related papers (2024-06-24T15:45:22Z) - Towards Efficient LLM Grounding for Embodied Multi-Agent Collaboration [70.09561665520043]
We propose a novel framework for multi-agent collaboration that introduces Reinforced Advantage feedback (ReAd) for efficient self-refinement of plans.
We provide theoretical analysis by extending advantage-weighted regression in reinforcement learning to multi-agent systems.
Experiments on Over-AI and a difficult variant of RoCoBench show that ReAd surpasses baselines in success rate, and also significantly decreases the interaction steps of agents.
arXiv Detail & Related papers (2024-05-23T08:33:19Z) - DS-Agent: Automated Data Science by Empowering Large Language Models with Case-Based Reasoning [56.887047551101574]
We present DS-Agent, a novel framework that harnesses large language models (LLMs) agent and case-based reasoning (CBR)
In the development stage, DS-Agent follows the CBR framework to structure an automatic iteration pipeline, which can flexibly capitalize on the expert knowledge from Kaggle.
In the deployment stage, DS-Agent implements a low-resource deployment stage with a simplified CBR paradigm, significantly reducing the demand on foundational capabilities of LLMs.
arXiv Detail & Related papers (2024-02-27T12:26:07Z) - LLMRefine: Pinpointing and Refining Large Language Models via Fine-Grained Actionable Feedback [65.84061725174269]
Recent large language models (LLM) are leveraging human feedback to improve their generation quality.
We propose LLMRefine, an inference time optimization method to refine LLM's output.
We conduct experiments on three text generation tasks, including machine translation, long-form question answering (QA), and topical summarization.
LLMRefine consistently outperforms all baseline approaches, achieving improvements up to 1.7 MetricX points on translation tasks, 8.1 ROUGE-L on ASQA, 2.2 ROUGE-L on topical summarization.
arXiv Detail & Related papers (2023-11-15T19:52:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.