The Forgotten Role of Search Queries in IR-based Bug Localization: An
Empirical Study
- URL: http://arxiv.org/abs/2108.05341v1
- Date: Wed, 11 Aug 2021 17:37:50 GMT
- Title: The Forgotten Role of Search Queries in IR-based Bug Localization: An
Empirical Study
- Authors: Mohammad Masudur Rahman and Foutse Khomh and Shamima Yeasmin and
Chanchal K. Roy
- Abstract summary: This article critically examines the state-of-the-art query selection practices in IR-based bug localization.
We exploit the Genetic Algorithm-based approach to construct optimal, near-optimal search queries from 2,320 bug reports.
We demonstrate 27%--34% improvement in the performance of non-optimal queries through the application of our actionable insights.
- Score: 17.809196793565224
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Being light-weight and cost-effective, IR-based approaches for bug
localization have shown promise in finding software bugs. However, the accuracy
of these approaches heavily depends on their used bug reports. A significant
number of bug reports contain only plain natural language texts. According to
existing studies, IR-based approaches cannot perform well when they use these
bug reports as search queries. On the other hand, there is a piece of recent
evidence that suggests that even these natural language-only reports contain
enough good keywords that could help localize the bugs successfully. On one
hand, these findings suggest that natural language-only bug reports might be a
sufficient source for good query keywords. On the other hand, they cast serious
doubt on the query selection practices in the IR-based bug localization. In
this article, we attempted to clear the sky on this aspect by conducting an
in-depth empirical study that critically examines the state-of-the-art query
selection practices in IR-based bug localization. In particular, we use a
dataset of 2,320 bug reports, employ ten existing approaches from the
literature, exploit the Genetic Algorithm-based approach to construct optimal,
near-optimal search queries from these bug reports, and then answer three
research questions. We confirmed that the state-of-the-art query construction
approaches are indeed not sufficient for constructing appropriate queries (for
bug localization) from certain natural language-only bug reports although they
contain such queries. We also demonstrate that optimal queries and non-optimal
queries chosen from bug report texts are significantly different in terms of
several keyword characteristics, which has led us to actionable insights.
Furthermore, we demonstrate 27%--34% improvement in the performance of
non-optimal queries through the application of our actionable insights to them.
Related papers
- Improved IR-based Bug Localization with Intelligent Relevance Feedback [2.9312156642007294]
Software bugs pose a significant challenge during development and maintenance, and practitioners spend nearly 50% of their time dealing with bugs.
Many existing techniques adopt Information Retrieval (IR) to localize a reported bug using textual and semantic relevance between bug reports and source code.
We present a novel technique for bug localization - BRaIn - that addresses the contextual gaps by assessing the relevance between bug reports and code.
arXiv Detail & Related papers (2025-01-17T20:29:38Z) - Enhancing IR-based Fault Localization using Large Language Models [5.032687557488094]
This paper enhances Fault Localization (IRFL) by categorizing bug reports based on programming entities, stack traces, and natural language text.
To address inaccuracies in queries, we introduce a user and conversational-based query reformulation approach, termed LLmiRQ+.
Evaluation on 46 projects with 6,340 bug reports yields an MRR of 0.6770 and MAP of 0.5118, surpassing seven state-of-the-art IRFL techniques.
arXiv Detail & Related papers (2024-12-04T22:47:51Z) - BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval [54.54576644403115]
Many complex real-world queries require in-depth reasoning to identify relevant documents.
We introduce BRIGHT, the first text retrieval benchmark that requires intensive reasoning to retrieve relevant documents.
Our dataset consists of 1,384 real-world queries spanning diverse domains, such as economics, psychology, mathematics, and coding.
arXiv Detail & Related papers (2024-07-16T17:58:27Z) - See, Say, and Segment: Teaching LMMs to Overcome False Premises [67.36381001664635]
We propose a cascading and joint training approach for LMMs to solve this task.
Our resulting model can "see" by detecting whether objects are present in an image, "say" by telling the user if they are not, and finally "segment" by outputting the mask of the desired objects if they exist.
arXiv Detail & Related papers (2023-12-13T18:58:04Z) - On Using GUI Interaction Data to Improve Text Retrieval-based Bug
Localization [10.717184444794505]
We investigate the hypothesis that, for end user-facing applications, connecting information in a bug report with information from the GUI, can improve upon existing techniques for bug localization.
We source the current largest dataset of fully-localized and reproducible real bugs for Android apps, with corresponding bug reports.
arXiv Detail & Related papers (2023-10-12T07:14:22Z) - Making Retrieval-Augmented Language Models Robust to Irrelevant Context [55.564789967211844]
An important desideratum of RALMs, is that retrieved information helps model performance when it is relevant.
Recent work has shown that retrieval augmentation can sometimes have a negative effect on performance.
arXiv Detail & Related papers (2023-10-02T18:52:35Z) - Recommending Bug Assignment Approaches for Individual Bug Reports: An
Empirical Investigation [8.186068333538893]
Multiple approaches have been proposed to automatically recommend potential developers who can address bug reports.
These approaches are typically designed to work for any bug report submitted to any software project.
We conducted an empirical study to validate this conjecture, using three bug assignment approaches applied on 2,249 bug reports from two open source systems.
arXiv Detail & Related papers (2023-05-29T23:02:56Z) - Enriching Relation Extraction with OpenIE [70.52564277675056]
Relation extraction (RE) is a sub-discipline of information extraction (IE)
In this work, we explore how recent approaches for open information extraction (OpenIE) may help to improve the task of RE.
Our experiments over two annotated corpora, KnowledgeNet and FewRel, demonstrate the improved accuracy of our enriched models.
arXiv Detail & Related papers (2022-12-19T11:26:23Z) - Auto-labelling of Bug Report using Natural Language Processing [0.0]
Rule and Query-based solutions recommend a long list of potential similar bug reports with no clear ranking.
In this paper, we have proposed a solution using a combination of NLP techniques.
It uses a custom data transformer, a deep neural network, and a non-generalizing machine learning method to retrieve existing identical bug reports.
arXiv Detail & Related papers (2022-12-13T02:32:42Z) - Using Developer Discussions to Guide Fixing Bugs in Software [51.00904399653609]
We propose using bug report discussions, which are available before the task is performed and are also naturally occurring, avoiding the need for additional information from developers.
We demonstrate that various forms of natural language context derived from such discussions can aid bug-fixing, even leading to improved performance over using commit messages corresponding to the oracle bug-fixing commits.
arXiv Detail & Related papers (2022-11-11T16:37:33Z) - BigIssue: A Realistic Bug Localization Benchmark [89.8240118116093]
BigIssue is a benchmark for realistic bug localization.
We provide a general benchmark with a diversity of real and synthetic Java bugs.
We hope to advance the state of the art in bug localization, in turn improving APR performance and increasing its applicability to the modern development cycle.
arXiv Detail & Related papers (2022-07-21T20:17:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.