Related papers: On Using GUI Interaction Data to Improve Text Retrieval-based Bug Localization

On Using GUI Interaction Data to Improve Text Retrieval-based Bug Localization

URL: http://arxiv.org/abs/2310.08083v1
Date: Thu, 12 Oct 2023 07:14:22 GMT
Title: On Using GUI Interaction Data to Improve Text Retrieval-based Bug Localization
Authors: Junayed Mahmud, Nadeeshan De Silva, Safwat Ali Khan, Seyed Hooman Mostafavi, SM Hasan Mansur, Oscar Chaparro, Andrian Marcus, and Kevin Moran
Abstract summary: We investigate the hypothesis that, for end user-facing applications, connecting information in a bug report with information from the GUI, can improve upon existing techniques for bug localization. We source the current largest dataset of fully-localized and reproducible real bugs for Android apps, with corresponding bug reports.
Score: 10.717184444794505
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: One of the most important tasks related to managing bug reports is localizing the fault so that a fix can be applied. As such, prior work has aimed to automate this task of bug localization by formulating it as an information retrieval problem, where potentially buggy files are retrieved and ranked according to their textual similarity with a given bug report. However, there is often a notable semantic gap between the information contained in bug reports and identifiers or natural language contained within source code files. For user-facing software, there is currently a key source of information that could aid in bug localization, but has not been thoroughly investigated - information from the GUI. We investigate the hypothesis that, for end user-facing applications, connecting information in a bug report with information from the GUI, and using this to aid in retrieving potentially buggy files, can improve upon existing techniques for bug localization. To examine this phenomenon, we conduct a comprehensive empirical study that augments four baseline techniques for bug localization with GUI interaction information from a reproduction scenario to (i) filter out potentially irrelevant files, (ii) boost potentially relevant files, and (iii) reformulate text-retrieval queries. To carry out our study, we source the current largest dataset of fully-localized and reproducible real bugs for Android apps, with corresponding bug reports, consisting of 80 bug reports from 39 popular open-source apps. Our results illustrate that augmenting traditional techniques with GUI information leads to a marked increase in effectiveness across multiple metrics, including a relative increase in Hits@10 of 13-18%. Additionally, through further analysis, we find that our studied augmentations largely complement existing techniques.

Related papers

Improved IR-based Bug Localization with Intelligent Relevance Feedback [2.9312156642007294]
Software bugs pose a significant challenge during development and maintenance, and practitioners spend nearly 50% of their time dealing with bugs. Many existing techniques adopt Information Retrieval (IR) to localize a reported bug using textual and semantic relevance between bug reports and source code. We present a novel technique for bug localization - BRaIn - that addresses the contextual gaps by assessing the relevance between bug reports and code.
arXiv Detail & Related papers (2025-01-17T20:29:38Z)
Harnessing Large Language Models for Knowledge Graph Question Answering via Adaptive Multi-Aspect Retrieval-Augmentation [81.18701211912779]
We introduce an Adaptive Multi-Aspect Retrieval-augmented over KGs (Amar) framework. This method retrieves knowledge including entities, relations, and subgraphs, and converts each piece of retrieved text into prompt embeddings. Our method has achieved state-of-the-art performance on two common datasets.
arXiv Detail & Related papers (2024-12-24T16:38:04Z)
Enhancing IR-based Fault Localization using Large Language Models [5.032687557488094]
This paper enhances Fault Localization (IRFL) by categorizing bug reports based on programming entities, stack traces, and natural language text. To address inaccuracies in queries, we introduce a user and conversational-based query reformulation approach, termed LLmiRQ+. Evaluation on 46 projects with 6,340 bug reports yields an MRR of 0.6770 and MAP of 0.5118, surpassing seven state-of-the-art IRFL techniques.
arXiv Detail & Related papers (2024-12-04T22:47:51Z)
Multi-View Adaptive Contrastive Learning for Information Retrieval Based Fault Localization [5.1987901165589]
We propose a novel approach named Multi-View Adaptive Contrastive Learning for Information Retrieval Fault localization (MACL-IRFL) We first generate data augmentations from report-code interaction view, report-report similarity view and code-code co-citation view separately, and adopt graph neural network to aggregate the information of bug reports or source code files from the three views in the embedding process. Our design of contrastive learning task will force the bug report representations to encode information shared by report-report and report-code views,and the source code file representations shared by code-code and report-code views,
arXiv Detail & Related papers (2024-09-19T07:20:10Z)
BLAZE: Cross-Language and Cross-Project Bug Localization via Dynamic Chunking and Hard Example Learning [1.9854146581797698]
BLAZE is an approach that employs dynamic chunking and hard example learning. It fine-tunes a GPT-based model using challenging bug cases to enhance cross-project and cross-language bug localization. BLAZE achieves up to an increase of 120% in Top 1 accuracy, 144% in Mean Average Precision (MAP), and 100% in Mean Reciprocal Rank (MRR)
arXiv Detail & Related papers (2024-07-24T20:44:36Z)
Language Modeling with Editable External Knowledge [90.7714362827356]
This paper introduces ERASE, which improves model behavior when new documents are acquired. It incrementally deletes or rewriting other entries in the knowledge base each time a document is added. It improves accuracy relative to conventional retrieval-augmented generation by 7-13% (Mixtral-8x7B) and 6-10% (Llama-3-8B) absolute.
arXiv Detail & Related papers (2024-06-17T17:59:35Z)
Too Few Bug Reports? Exploring Data Augmentation for Improved Changeset-based Bug Localization [7.884766610628946]
We propose novel data augmentation operators that act on different constituent components of bug reports. We also describe a data balancing strategy that aims to create a corpus of augmented bug reports.
arXiv Detail & Related papers (2023-05-25T19:06:01Z)
Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy [164.83371924650294]
We show that strong performance can be achieved by a method we call Iter-RetGen, which synergizes retrieval and generation in an iterative manner. A model output shows what might be needed to finish a task, and thus provides an informative context for retrieving more relevant knowledge. Iter-RetGen processes all retrieved knowledge as a whole and largely preserves the flexibility in generation without structural constraints.
arXiv Detail & Related papers (2023-05-24T16:17:36Z)
Auto-labelling of Bug Report using Natural Language Processing [0.0]
Rule and Query-based solutions recommend a long list of potential similar bug reports with no clear ranking. In this paper, we have proposed a solution using a combination of NLP techniques. It uses a custom data transformer, a deep neural network, and a non-generalizing machine learning method to retrieve existing identical bug reports.
arXiv Detail & Related papers (2022-12-13T02:32:42Z)
Using Developer Discussions to Guide Fixing Bugs in Software [51.00904399653609]
We propose using bug report discussions, which are available before the task is performed and are also naturally occurring, avoiding the need for additional information from developers. We demonstrate that various forms of natural language context derived from such discussions can aid bug-fixing, even leading to improved performance over using commit messages corresponding to the oracle bug-fixing commits.
arXiv Detail & Related papers (2022-11-11T16:37:33Z)
Automatic Classification of Bug Reports Based on Multiple Text Information and Reports' Intention [37.67372105858311]
This paper proposes a new automatic classification method for bug reports. The innovation is that when categorizing bug reports, in addition to using the text information of the report, the intention of the report is also considered. Our proposed method achieves better performance and its F-Measure achieves from 87.3% to 95.5%.
arXiv Detail & Related papers (2022-08-02T06:44:51Z)
BigIssue: A Realistic Bug Localization Benchmark [89.8240118116093]
BigIssue is a benchmark for realistic bug localization. We provide a general benchmark with a diversity of real and synthetic Java bugs. We hope to advance the state of the art in bug localization, in turn improving APR performance and increasing its applicability to the modern development cycle.
arXiv Detail & Related papers (2022-07-21T20:17:53Z)
S3M: Siamese Stack (Trace) Similarity Measure [55.58269472099399]
We present S3M -- the first approach to computing stack trace similarity based on deep learning. It is based on a biLSTM encoder and a fully-connected classifier to compute similarity. Our experiments demonstrate the superiority of our approach over the state-of-the-art on both open-sourced data and a private JetBrains dataset.
arXiv Detail & Related papers (2021-03-18T21:10:41Z)
KILT: a Benchmark for Knowledge Intensive Language Tasks [102.33046195554886]
We present a benchmark for knowledge-intensive language tasks (KILT) All tasks in KILT are grounded in the same snapshot of Wikipedia. We find that a shared dense vector index coupled with a seq2seq model is a strong baseline.
arXiv Detail & Related papers (2020-09-04T15:32:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.