Feedback-Driven Automated Whole Bug Report Reproduction for Android Apps
- URL: http://arxiv.org/abs/2407.05165v3
- Date: Wed, 28 Aug 2024 13:52:57 GMT
- Title: Feedback-Driven Automated Whole Bug Report Reproduction for Android Apps
- Authors: Dingbang Wang, Yu Zhao, Sidong Feng, Zhaoxu Zhang, William G. J. Halfond, Chunyang Chen, Xiaoxia Sun, Jiangfan Shi, Tingting Yu,
- Abstract summary: ReBL is a novel feedback-driven approach to reproduce Android bug reports.
It is more flexible and context-aware than the traditional step-by-step entity matching approach.
It has the capability of handling non-crash functional bug reports.
- Score: 23.460238111094608
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In software development, bug report reproduction is a challenging task. This paper introduces ReBL, a novel feedback-driven approach that leverages GPT-4, a large-scale language model (LLM), to automatically reproduce Android bug reports. Unlike traditional methods, ReBL bypasses the use of Step to Reproduce (S2R) entities. Instead, it leverages the entire textual bug report and employs innovative prompts to enhance GPT's contextual reasoning. This approach is more flexible and context-aware than the traditional step-by-step entity matching approach, resulting in improved accuracy and effectiveness. In addition to handling crash reports, ReBL has the capability of handling non-crash functional bug reports. Our evaluation of 96 Android bug reports (73 crash and 23 non-crash) demonstrates that ReBL successfully reproduced 90.63% of these reports, averaging only 74.98 seconds per bug report. Additionally, ReBL outperformed three existing tools in both success rate and speed.
Related papers
- DebugBench: Evaluating Debugging Capability of Large Language Models [80.73121177868357]
DebugBench is a benchmark for Large Language Models (LLMs)
It covers four major bug categories and 18 minor types in C++, Java, and Python.
We evaluate two commercial and four open-source models in a zero-shot scenario.
arXiv Detail & Related papers (2024-01-09T15:46:38Z) - RAP-Gen: Retrieval-Augmented Patch Generation with CodeT5 for Automatic
Program Repair [75.40584530380589]
We propose a novel Retrieval-Augmented Patch Generation framework (RAP-Gen)
RAP-Gen explicitly leveraging relevant fix patterns retrieved from a list of previous bug-fix pairs.
We evaluate RAP-Gen on three benchmarks in two programming languages, including the TFix benchmark in JavaScript, and Code Refinement and Defects4J benchmarks in Java.
arXiv Detail & Related papers (2023-09-12T08:52:56Z) - A Comparative Study of Text Embedding Models for Semantic Text
Similarity in Bug Reports [0.0]
Retrieving similar bug reports from an existing database can help reduce the time and effort required to resolve bugs.
We explored several embedding models such as TF-IDF (Baseline), FastText, Gensim, BERT, and ADA.
Our study provides insights into the effectiveness of different embedding methods for retrieving similar bug reports and highlights the impact of selecting the appropriate one for this task.
arXiv Detail & Related papers (2023-08-17T21:36:56Z) - Prompting Is All You Need: Automated Android Bug Replay with Large Language Models [28.69675481931385]
We propose AdbGPT, a new lightweight approach to automatically reproduce the bugs from bug reports through prompt engineering.
AdbGPT leverages few-shot learning and chain-of-thought reasoning to elicit human knowledge and logical reasoning from LLMs.
Our evaluations demonstrate the effectiveness and efficiency of our AdbGPT to reproduce 81.3% of bug reports in 253.6 seconds.
arXiv Detail & Related papers (2023-06-03T03:03:52Z) - Teaching Large Language Models to Self-Debug [62.424077000154945]
Large language models (LLMs) have achieved impressive performance on code generation.
We propose Self- Debugging, which teaches a large language model to debug its predicted program via few-shot demonstrations.
arXiv Detail & Related papers (2023-04-11T10:43:43Z) - Auto-labelling of Bug Report using Natural Language Processing [0.0]
Rule and Query-based solutions recommend a long list of potential similar bug reports with no clear ranking.
In this paper, we have proposed a solution using a combination of NLP techniques.
It uses a custom data transformer, a deep neural network, and a non-generalizing machine learning method to retrieve existing identical bug reports.
arXiv Detail & Related papers (2022-12-13T02:32:42Z) - Using Developer Discussions to Guide Fixing Bugs in Software [51.00904399653609]
We propose using bug report discussions, which are available before the task is performed and are also naturally occurring, avoiding the need for additional information from developers.
We demonstrate that various forms of natural language context derived from such discussions can aid bug-fixing, even leading to improved performance over using commit messages corresponding to the oracle bug-fixing commits.
arXiv Detail & Related papers (2022-11-11T16:37:33Z) - Large Language Models are Few-shot Testers: Exploring LLM-based General
Bug Reproduction [14.444294152595429]
The number of tests added in open source repositories due to issues was about 28% of the corresponding project test suite size.
We propose LIBRO, a framework that uses Large Language Models (LLMs), which have been shown to be capable of performing code-related tasks.
Our evaluation of LIBRO shows that, on the widely studied Defects4J benchmark, LIBRO can generate failure reproducing test cases for 33% of all studied cases.
arXiv Detail & Related papers (2022-09-23T10:50:47Z) - Automatic Classification of Bug Reports Based on Multiple Text
Information and Reports' Intention [37.67372105858311]
This paper proposes a new automatic classification method for bug reports.
The innovation is that when categorizing bug reports, in addition to using the text information of the report, the intention of the report is also considered.
Our proposed method achieves better performance and its F-Measure achieves from 87.3% to 95.5%.
arXiv Detail & Related papers (2022-08-02T06:44:51Z) - S3M: Siamese Stack (Trace) Similarity Measure [55.58269472099399]
We present S3M -- the first approach to computing stack trace similarity based on deep learning.
It is based on a biLSTM encoder and a fully-connected classifier to compute similarity.
Our experiments demonstrate the superiority of our approach over the state-of-the-art on both open-sourced data and a private JetBrains dataset.
arXiv Detail & Related papers (2021-03-18T21:10:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.