Resolving Crash Bugs via Large Language Models: An Empirical Study
- URL: http://arxiv.org/abs/2312.10448v1
- Date: Sat, 16 Dec 2023 13:41:04 GMT
- Title: Resolving Crash Bugs via Large Language Models: An Empirical Study
- Authors: Xueying Du, Mingwei Liu, Juntao Li, Hanlin Wang, Xin Peng, Yiling Lou
- Abstract summary: Crash bugs cause unexpected program behaviors or even termination, requiring high-priority resolution.
ChatGPT, a recent large language model (LLM), has garnered significant attention due to its exceptional performance across various domains.
This work performs the first investigation into ChatGPT's capability in resolve real-world crash bugs, focusing on its effectiveness in both localizing and repairing code-related and environment-related crash bugs.
- Score: 20.32724670868432
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Crash bugs cause unexpected program behaviors or even termination, requiring
high-priority resolution. However, manually resolving crash bugs is challenging
and labor-intensive, and researchers have proposed various techniques for their
automated localization and repair. ChatGPT, a recent large language model
(LLM), has garnered significant attention due to its exceptional performance
across various domains. This work performs the first investigation into
ChatGPT's capability in resolve real-world crash bugs, focusing on its
effectiveness in both localizing and repairing code-related and
environment-related crash bugs. Specifically, we initially assess ChatGPT's
fundamental ability to resolve crash bugs with basic prompts in a single
iteration. We observe that ChatGPT performs better at resolving code-related
crash bugs compared to environment-related ones, and its primary challenge in
resolution lies in inaccurate localization. Additionally, we explore ChatGPT's
potential with various advanced prompts. Furthermore, by stimulating ChatGPT's
self-planning, it methodically investigates each potential crash-causing
environmental factor through proactive inquiry, ultimately identifying the root
cause of the crash. Based on our findings, we propose IntDiagSolver, an
interaction methodology designed to facilitate precise crash bug resolution
through continuous interaction with LLMs. Evaluating IntDiagSolver on multiple
LLMs reveals consistent enhancement in the accuracy of crash bug resolution,
including ChatGPT, Claude, and CodeLlama.
Related papers
- From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical Debugging [5.910272203315325]
We introduce Multi-Granularity Debugger (MG Debugger), a hierarchical code debugger by isolating, identifying, and resolving bugs at various levels of granularity.
MG Debugger decomposes problematic code into a hierarchical tree structure of subfunctions, with each level representing a particular granularity of error.
It achieves an 18.9% improvement in accuracy over seed generations in HumanEval and a 97.6% repair success rate in HumanEvalFix.
arXiv Detail & Related papers (2024-10-02T03:57:21Z) - DebugBench: Evaluating Debugging Capability of Large Language Models [80.73121177868357]
DebugBench is a benchmark for Large Language Models (LLMs)
It covers four major bug categories and 18 minor types in C++, Java, and Python.
We evaluate two commercial and four open-source models in a zero-shot scenario.
arXiv Detail & Related papers (2024-01-09T15:46:38Z) - Exploring ChatGPT's Capabilities on Vulnerability Management [56.4403395100589]
We explore ChatGPT's capabilities on 6 tasks involving the complete vulnerability management process with a large-scale dataset containing 70,346 samples.
One notable example is ChatGPT's proficiency in tasks like generating titles for software bug reports.
Our findings reveal the difficulties encountered by ChatGPT and shed light on promising future directions.
arXiv Detail & Related papers (2023-11-11T11:01:13Z) - ChatGPT for Vulnerability Detection, Classification, and Repair: How Far
Are We? [24.61869093475626]
Large language models (LLMs) like ChatGPT exhibited remarkable advancement in a range of software engineering tasks.
We compare ChatGPT with state-of-the-art language models designed for software vulnerability purposes.
We found that ChatGPT achieves limited performance, trailing behind other language models in vulnerability contexts by a significant margin.
arXiv Detail & Related papers (2023-10-15T12:01:35Z) - A Critical Review of Large Language Model on Software Engineering: An Example from ChatGPT and Automated Program Repair [19.123640635549524]
Large Language Models (LLMs) have been gaining increasing attention and demonstrated promising performance across a variety of software engineering tasks.
This paper reviews the bug-fixing capabilities of ChatGPT on a clean APR benchmark with different research objectives.
ChatGPT is able to fix 109 out of 151 buggy programs using the basic prompt within 35 independent rounds, outperforming state-of-the-art LLMs CodeT5 and PLBART by 27.5% and 62.4% prediction accuracy.
arXiv Detail & Related papers (2023-10-13T06:11:47Z) - Enhancing Large Language Models in Coding Through Multi-Perspective Self-Consistency [127.97467912117652]
Large language models (LLMs) have exhibited remarkable ability in code generation.
However, generating the correct solution in a single attempt still remains a challenge.
We propose the Multi-Perspective Self-Consistency (MPSC) framework incorporating both inter- and intra-consistency.
arXiv Detail & Related papers (2023-09-29T14:23:26Z) - Extending the Frontier of ChatGPT: Code Generation and Debugging [0.0]
ChatGPT, developed by OpenAI, has ushered in a new era by utilizing artificial intelligence (AI) to tackle diverse problem domains.
This research paper delves into the efficacy of ChatGPT in solving programming problems, examining both the correctness and the efficiency of its solution in terms of time and memory complexity.
The research reveals a commendable overall success rate of 71.875%, denoting the proportion of problems for which ChatGPT was able to provide correct solutions.
arXiv Detail & Related papers (2023-07-17T06:06:58Z) - Teaching Large Language Models to Self-Debug [62.424077000154945]
Large language models (LLMs) have achieved impressive performance on code generation.
We propose Self- Debugging, which teaches a large language model to debug its predicted program via few-shot demonstrations.
arXiv Detail & Related papers (2023-04-11T10:43:43Z) - Using Developer Discussions to Guide Fixing Bugs in Software [51.00904399653609]
We propose using bug report discussions, which are available before the task is performed and are also naturally occurring, avoiding the need for additional information from developers.
We demonstrate that various forms of natural language context derived from such discussions can aid bug-fixing, even leading to improved performance over using commit messages corresponding to the oracle bug-fixing commits.
arXiv Detail & Related papers (2022-11-11T16:37:33Z) - A Causal Framework to Quantify the Robustness of Mathematical Reasoning
with Language Models [81.15974174627785]
We study the behavior of language models in terms of robustness and sensitivity to direct interventions in the input space.
Our analysis shows that robustness does not appear to continuously improve as a function of size, but the GPT-3 Davinci models (175B) achieve a dramatic improvement in both robustness and sensitivity compared to all other GPT variants.
arXiv Detail & Related papers (2022-10-21T15:12:37Z) - What to Prioritize? Natural Language Processing for the Development of a
Modern Bug Tracking Solution in Hardware Development [0.0]
We present an approach to predict the time to fix, the risk and the complexity of a bug report using different supervised machine learning algorithms.
The evaluation shows that a combination of text embeddings generated through the Universal Sentence model outperforms all other methods.
arXiv Detail & Related papers (2021-09-28T15:55:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.