Improving IR-based Bug Localization with Semantics-Driven Query Reduction
- URL: http://arxiv.org/abs/2510.04468v1
- Date: Mon, 06 Oct 2025 03:43:38 GMT
- Title: Improving IR-based Bug Localization with Semantics-Driven Query Reduction
- Authors: Asif Mohammed Samir, Mohammad Masudur Rahman,
- Abstract summary: We propose IQLoc, a novel approach to localize software bugs against bug reports.<n>We leverage the program semantics understanding of transformer-based models to reason about the suspiciousness of code.<n> IQLoc improves MAP by 91.67% for bug reports with stack traces, 72.73% for those that include code elements, and 65.38% for those containing only descriptions in natural language.
- Score: 0.9298382208776371
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite decades of research, software bug localization remains challenging due to heterogeneous content and inherent ambiguities in bug reports. Existing methods such as Information Retrieval (IR)-based approaches often attempt to match source documents to bug reports, overlooking the context and semantics of the source code. On the other hand, Large Language Models (LLM) (e.g., Transformer models) show promising results in understanding both texts and code. However, they have not been yet adapted well to localize software bugs against bug reports. They could be also data or resource-intensive. To bridge this gap, we propose, IQLoc, a novel bug localization approach that capitalizes on the strengths of both IR and LLM-based approaches. In particular, we leverage the program semantics understanding of transformer-based models to reason about the suspiciousness of code and reformulate queries during bug localization using Information Retrieval. To evaluate IQLoc, we refine the Bench4BL benchmark dataset and extend it by incorporating ~30% more recent bug reports, resulting in a benchmark containing ~7.5K bug reports. We evaluated IQLoc using three performance metrics and compare it against four baseline techniques. Experimental results demonstrate its superiority, achieving up to 58.52% and 60.59% in MAP, 61.49% and 64.58% in MRR, and 69.88% and 100.90% in HIT@K for the test bug reports with random and time-wise splits, respectively. Moreover, IQLoc improves MAP by 91.67% for bug reports with stack traces, 72.73% for those that include code elements, and 65.38% for those containing only descriptions in natural language. By integrating program semantic understanding into Information Retrieval, IQLoc mitigates several longstanding challenges of traditional IR-based approaches in bug localization.
Related papers
- Improved Bug Localization with AI Agents Leveraging Hypothesis and Dynamic Cognition [0.9298382208776371]
Software bugs cost technology providers (e.g., AT&T) billions annually and cause developers to spend roughly 50% of their time on bug resolution.<n>Traditional methods for bug localization often analyze the suspiciousness of code components in isolation.<n>Recent advances in Large Language Models (LLMs) and agentic AI techniques have shown strong potential code understanding, but still lack causal reasoning during code exploration.<n>We present a novel agentic technique for bug localization -- CogniGent -- that overcomes the limitations above by multiple AI agents capable of causal reasoning, call-graph-based root cause analysis and context.
arXiv Detail & Related papers (2026-01-18T18:12:21Z) - BugPilot: Complex Bug Generation for Efficient Learning of SWE Skills [59.003563837981886]
High quality bugs are key to training the next generation of language model based software engineering (SWE) agents.<n>We introduce a novel method for synthetic generation of difficult and diverse bugs.
arXiv Detail & Related papers (2025-10-22T17:58:56Z) - Can We Enhance Bug Report Quality Using LLMs?: An Empirical Study of LLM-Based Bug Report Generation [0.0]
In this paper, we explore whether instruction fine-tuned Large Language Models (LLMs) can automatically transform casual, unstructured bug reports into high-quality, structured bug reports adhering to a standard template.<n>We evaluate three open-source instruction-tuned LLMs (emphQwen 2.5, Mistral, and Llama 3.2) against ChatGPT-4o, measuring performance on established metrics such as CTQRS, ROUGE, METEOR, and SBERT.<n>Our experiments show that fine-tuned Qwen 2.5 achieves a CTQRS score of textbf77%,
arXiv Detail & Related papers (2025-04-26T05:15:53Z) - EquiBench: Benchmarking Large Language Models' Reasoning about Program Semantics via Equivalence Checking [58.15568681219339]
We introduce EquiBench, a new benchmark for evaluating large language models (LLMs)<n>This task directly tests a model's ability to reason about program semantics.<n>We evaluate 19 state-of-the-art LLMs and find that in the most challenging categories, the best accuracies are 63.8% and 76.2%, only modestly above the 50% random baseline.
arXiv Detail & Related papers (2025-02-18T02:54:25Z) - Improved IR-based Bug Localization with Intelligent Relevance Feedback [2.9312156642007294]
Software bugs pose a significant challenge during development and maintenance, and practitioners spend nearly 50% of their time dealing with bugs.<n>Many existing techniques adopt Information Retrieval (IR) to localize a reported bug using textual and semantic relevance between bug reports and source code.<n>We present a novel technique for bug localization - BRaIn - that addresses the contextual gaps by assessing the relevance between bug reports and code.
arXiv Detail & Related papers (2025-01-17T20:29:38Z) - Enhancing IR-based Fault Localization using Large Language Models [5.032687557488094]
This paper enhances Fault Localization (IRFL) by categorizing bug reports based on programming entities, stack traces, and natural language text.<n>To address inaccuracies in queries, we introduce a user and conversational-based query reformulation approach, termed LLmiRQ+.<n> Evaluation on 46 projects with 6,340 bug reports yields an MRR of 0.6770 and MAP of 0.5118, surpassing seven state-of-the-art IRFL techniques.
arXiv Detail & Related papers (2024-12-04T22:47:51Z) - RustRepoTrans: Repository-level Code Translation Benchmark Targeting Rust [50.65321080814249]
RustRepoTrans is the first repository-level context code translation benchmark targeting incremental translation.<n>We evaluate seven representative LLMs, analyzing their errors to assess limitations in complex translation scenarios.
arXiv Detail & Related papers (2024-11-21T10:00:52Z) - Exploring Response Uncertainty in MLLMs: An Empirical Evaluation under Misleading Scenarios [49.53589774730807]
Multimodal large language models (MLLMs) have recently achieved state-of-the-art performance on tasks ranging from visual question answering to video understanding.<n>We reveal a response uncertainty phenomenon: twelve state-of-the-art open-source MLLMs overturn a previously correct answer in 65% of cases after receiving a single deceptive cue.
arXiv Detail & Related papers (2024-11-05T01:11:28Z) - Exploring Automatic Cryptographic API Misuse Detection in the Era of LLMs [60.32717556756674]
This paper introduces a systematic evaluation framework to assess Large Language Models in detecting cryptographic misuses.
Our in-depth analysis of 11,940 LLM-generated reports highlights that the inherent instabilities in LLMs can lead to over half of the reports being false positives.
The optimized approach achieves a remarkable detection rate of nearly 90%, surpassing traditional methods and uncovering previously unknown misuses in established benchmarks.
arXiv Detail & Related papers (2024-07-23T15:31:26Z) - Leveraging Stack Traces for Spectrum-based Fault Localization in the Absence of Failing Tests [44.13331329339185]
We introduce a new approach, SBEST, that integrates stack trace data with test coverage to enhance fault localization.
Our approach shows a significant improvement, increasing Mean Average Precision (MAP) by 32.22% and Mean Reciprocal Rank (MRR) by 17.43% over traditional stack trace ranking methods.
arXiv Detail & Related papers (2024-05-01T15:15:52Z) - See, Say, and Segment: Teaching LMMs to Overcome False Premises [67.36381001664635]
We propose a cascading and joint training approach for LMMs to solve this task.
Our resulting model can "see" by detecting whether objects are present in an image, "say" by telling the user if they are not, and finally "segment" by outputting the mask of the desired objects if they exist.
arXiv Detail & Related papers (2023-12-13T18:58:04Z) - The Forgotten Role of Search Queries in IR-based Bug Localization: An
Empirical Study [17.809196793565224]
This article critically examines the state-of-the-art query selection practices in IR-based bug localization.
We exploit the Genetic Algorithm-based approach to construct optimal, near-optimal search queries from 2,320 bug reports.
We demonstrate 27%--34% improvement in the performance of non-optimal queries through the application of our actionable insights.
arXiv Detail & Related papers (2021-08-11T17:37:50Z) - S3M: Siamese Stack (Trace) Similarity Measure [55.58269472099399]
We present S3M -- the first approach to computing stack trace similarity based on deep learning.
It is based on a biLSTM encoder and a fully-connected classifier to compute similarity.
Our experiments demonstrate the superiority of our approach over the state-of-the-art on both open-sourced data and a private JetBrains dataset.
arXiv Detail & Related papers (2021-03-18T21:10:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.