What Makes ChatGPT Effective for Software Issue Resolution? An Empirical Study of Developer-ChatGPT Conversations in GitHub
- URL: http://arxiv.org/abs/2506.22390v1
- Date: Fri, 27 Jun 2025 17:00:48 GMT
- Title: What Makes ChatGPT Effective for Software Issue Resolution? An Empirical Study of Developer-ChatGPT Conversations in GitHub
- Authors: Ramtin Ehsani, Sakshi Pathak, Esteban Parra, Sonia Haiduc, Preetha Chatterjee,
- Abstract summary: We analyze 686 developer-ChatGPT conversations shared within GitHub issue threads to identify characteristics that make these conversations effective for issue resolution.<n>ChatGPT is most effective for code generation and tools/libraries/APIs recommendations, but struggles with code explanations.<n>At the issue level, ChatGPT performs best on simpler problems with limited developer activity and faster resolution.
- Score: 4.928297656574645
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Conversational large-language models are extensively used for issue resolution tasks. However, not all developer-LLM conversations are useful for effective issue resolution. In this paper, we analyze 686 developer-ChatGPT conversations shared within GitHub issue threads to identify characteristics that make these conversations effective for issue resolution. First, we analyze the conversations and their corresponding issues to distinguish helpful from unhelpful conversations. We begin by categorizing the types of tasks developers seek help with to better understand the scenarios in which ChatGPT is most effective. Next, we examine a wide range of conversational, project, and issue-related metrics to uncover factors associated with helpful conversations. Finally, we identify common deficiencies in unhelpful ChatGPT responses to highlight areas that could inform the design of more effective developer-facing tools. We found that only 62% of the ChatGPT conversations were helpful for successful issue resolution. ChatGPT is most effective for code generation and tools/libraries/APIs recommendations, but struggles with code explanations. Helpful conversations tend to be shorter, more readable, and exhibit stronger semantic and linguistic alignment. Larger, more popular projects and more experienced developers benefit more from ChatGPT. At the issue level, ChatGPT performs best on simpler problems with limited developer activity and faster resolution, typically well-scoped tasks like compilation errors. The most common deficiencies in unhelpful ChatGPT responses include incorrect information and lack of comprehensiveness. Our findings have wide implications including guiding developers on effective interaction strategies for issue resolution, informing the development of tools or frameworks to support optimal prompt design, and providing insights on fine-tuning LLMs for issue resolution tasks.
Related papers
- Towards Detecting Prompt Knowledge Gaps for Improved LLM-guided Issue Resolution [3.768737590492549]
We analyze 433 developer-ChatGPT conversations within GitHub issue threads to examine the impact of prompt knowledge gaps and conversation styles on issue resolution.<n>We find that ineffective conversations contain knowledge gaps in 44.6% of prompts, compared to only 12.6% in effective ones.
arXiv Detail & Related papers (2025-01-20T19:41:42Z) - Why Do Developers Engage with ChatGPT in Issue-Tracker? Investigating Usage and Reliance on ChatGPT-Generated Code [4.605779671279481]
We analyzed 1,152 Developer-ChatGPT conversations across 1,012 issues in GitHub.<n>ChatGPT is primarily utilized for ideation, whereas its usage for validation is minimal.<n>ChatGPT-generated code was used as-is to resolve only 5.83% of the issues.
arXiv Detail & Related papers (2024-12-09T18:47:31Z) - An Empirical Study on Developers Shared Conversations with ChatGPT in GitHub Pull Requests and Issues [20.121332699827633]
ChatGPT has significantly impacted software development practices.
Despite its widespread adoption, the impact of ChatGPT as an assistant in collaborative coding remains largely unexplored.
We analyze a dataset of 210 and 370 developers shared conversations with ChatGPT in GitHub pull requests (PRs) and issues.
arXiv Detail & Related papers (2024-03-15T16:58:37Z) - Investigating the Utility of ChatGPT in the Issue Tracking System: An
Exploratory Study [5.176434782905268]
This study examines the interaction between ChatGPT and developers to analyze their prevalent activities and provide a resolution.
Our investigation reveals that developers mainly use ChatGPT for brainstorming solutions but often opt to write their code instead of using ChatGPT-generated code.
arXiv Detail & Related papers (2024-02-06T06:03:05Z) - Exploring ChatGPT's Capabilities on Vulnerability Management [56.4403395100589]
We explore ChatGPT's capabilities on 6 tasks involving the complete vulnerability management process with a large-scale dataset containing 70,346 samples.
One notable example is ChatGPT's proficiency in tasks like generating titles for software bug reports.
Our findings reveal the difficulties encountered by ChatGPT and shed light on promising future directions.
arXiv Detail & Related papers (2023-11-11T11:01:13Z) - Primacy Effect of ChatGPT [69.49920102917598]
We study the primacy effect of ChatGPT: the tendency of selecting the labels at earlier positions as the answer.
We hope that our experiments and analyses provide additional insights into building more reliable ChatGPT-based solutions.
arXiv Detail & Related papers (2023-10-20T00:37:28Z) - DevGPT: Studying Developer-ChatGPT Conversations [12.69439932665687]
This paper introduces DevGPT, a dataset curated to explore how software developers interact with ChatGPT.
The dataset encompasses 29,778 prompts and responses from ChatGPT, including 19,106 code snippets.
arXiv Detail & Related papers (2023-08-31T06:55:40Z) - Extending the Frontier of ChatGPT: Code Generation and Debugging [0.0]
ChatGPT, developed by OpenAI, has ushered in a new era by utilizing artificial intelligence (AI) to tackle diverse problem domains.
This research paper delves into the efficacy of ChatGPT in solving programming problems, examining both the correctness and the efficiency of its solution in terms of time and memory complexity.
The research reveals a commendable overall success rate of 71.875%, denoting the proportion of problems for which ChatGPT was able to provide correct solutions.
arXiv Detail & Related papers (2023-07-17T06:06:58Z) - ChatCoT: Tool-Augmented Chain-of-Thought Reasoning on Chat-based Large
Language Models [125.7209927536255]
We propose ChatCoT, a tool-augmented chain-of-thought reasoning framework for chat-based LLMs.
In ChatCoT, we model the chain-of-thought (CoT) reasoning as multi-turn conversations, to utilize tools in a more natural way through chatting.
Our approach can effectively leverage the multi-turn conversation ability of chat-based LLMs, and integrate the thought chain following and tools manipulation in a unified way.
arXiv Detail & Related papers (2023-05-23T17:54:33Z) - A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on
Reasoning, Hallucination, and Interactivity [79.12003701981092]
We carry out an extensive technical evaluation of ChatGPT using 23 data sets covering 8 different common NLP application tasks.
We evaluate the multitask, multilingual and multi-modal aspects of ChatGPT based on these data sets and a newly designed multimodal dataset.
ChatGPT is 63.41% accurate on average in 10 different reasoning categories under logical reasoning, non-textual reasoning, and commonsense reasoning.
arXiv Detail & Related papers (2023-02-08T12:35:34Z) - Is ChatGPT a General-Purpose Natural Language Processing Task Solver? [113.22611481694825]
Large language models (LLMs) have demonstrated the ability to perform a variety of natural language processing (NLP) tasks zero-shot.
Recently, the debut of ChatGPT has drawn a great deal of attention from the natural language processing (NLP) community.
It is not yet known whether ChatGPT can serve as a generalist model that can perform many NLP tasks zero-shot.
arXiv Detail & Related papers (2023-02-08T09:44:51Z) - A Categorical Archive of ChatGPT Failures [47.64219291655723]
ChatGPT, developed by OpenAI, has been trained using massive amounts of data and simulates human conversation.
It has garnered significant attention due to its ability to effectively answer a broad range of human inquiries.
However, a comprehensive analysis of ChatGPT's failures is lacking, which is the focus of this study.
arXiv Detail & Related papers (2023-02-06T04:21:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.