Why Do Developers Engage with ChatGPT in Issue-Tracker? Investigating Usage and Reliance on ChatGPT-Generated Code
- URL: http://arxiv.org/abs/2412.06757v2
- Date: Tue, 10 Dec 2024 19:24:10 GMT
- Title: Why Do Developers Engage with ChatGPT in Issue-Tracker? Investigating Usage and Reliance on ChatGPT-Generated Code
- Authors: Joy Krishan Das, Saikat Mondal, Chanchal K. Roy,
- Abstract summary: We analyzed 1,152 Developer-ChatGPT conversations across 1,012 issues in GitHub.
ChatGPT is primarily utilized for ideation, whereas its usage for validation is minimal.
ChatGPT-generated code was used as-is to resolve only 5.83% of the issues.
- Score: 4.605779671279481
- License:
- Abstract: Large language models (LLMs) like ChatGPT have shown the potential to assist developers with coding and debugging tasks. However, their role in collaborative issue resolution is underexplored. In this study, we analyzed 1,152 Developer-ChatGPT conversations across 1,012 issues in GitHub to examine the diverse usage of ChatGPT and reliance on its generated code. Our contributions are fourfold. First, we manually analyzed 289 conversations to understand ChatGPT's usage in the GitHub Issues. Our analysis revealed that ChatGPT is primarily utilized for ideation, whereas its usage for validation (e.g., code documentation accuracy) is minimal. Second, we applied BERTopic modeling to identify key areas of engagement on the entire dataset. We found that backend issues (e.g., API management) dominate conversations, while testing is surprisingly less covered. Third, we utilized the CPD clone detection tool to check if the code generated by ChatGPT was used to address issues. Our findings revealed that ChatGPT-generated code was used as-is to resolve only 5.83\% of the issues. Fourth, we estimated sentiment using a RoBERTa-based sentiment analysis model to determine developers' satisfaction with different usages and engagement areas. We found positive sentiment (i.e., high satisfaction) about using ChatGPT for refactoring and addressing data analytics (e.g., categorizing table data) issues. On the contrary, we observed negative sentiment when using ChatGPT to debug issues and address automation tasks (e.g., GUI interactions). Our findings show the unmet needs and growing dissatisfaction among developers. Researchers and ChatGPT developers should focus on developing task-specific solutions that help resolve diverse issues, improving user satisfaction and problem-solving efficiency in software development.
Related papers
- Fight Fire with Fire: How Much Can We Trust ChatGPT on Source Code-Related Tasks? [10.389763758883975]
Recent studies proposed utilizing ChatGPT as both a developer and tester for collaborative software development.
We conduct a comprehensive empirical investigation to evaluate ChatGPT's self-verification capability in code generation, code completion, and program repair.
arXiv Detail & Related papers (2024-05-21T09:47:33Z) - An Empirical Study on Developers Shared Conversations with ChatGPT in GitHub Pull Requests and Issues [20.121332699827633]
ChatGPT has significantly impacted software development practices.
Despite its widespread adoption, the impact of ChatGPT as an assistant in collaborative coding remains largely unexplored.
We analyze a dataset of 210 and 370 developers shared conversations with ChatGPT in GitHub pull requests (PRs) and issues.
arXiv Detail & Related papers (2024-03-15T16:58:37Z) - Investigating the Utility of ChatGPT in the Issue Tracking System: An
Exploratory Study [5.176434782905268]
This study examines the interaction between ChatGPT and developers to analyze their prevalent activities and provide a resolution.
Our investigation reveals that developers mainly use ChatGPT for brainstorming solutions but often opt to write their code instead of using ChatGPT-generated code.
arXiv Detail & Related papers (2024-02-06T06:03:05Z) - Exploring ChatGPT's Capabilities on Vulnerability Management [56.4403395100589]
We explore ChatGPT's capabilities on 6 tasks involving the complete vulnerability management process with a large-scale dataset containing 70,346 samples.
One notable example is ChatGPT's proficiency in tasks like generating titles for software bug reports.
Our findings reveal the difficulties encountered by ChatGPT and shed light on promising future directions.
arXiv Detail & Related papers (2023-11-11T11:01:13Z) - Primacy Effect of ChatGPT [69.49920102917598]
We study the primacy effect of ChatGPT: the tendency of selecting the labels at earlier positions as the answer.
We hope that our experiments and analyses provide additional insights into building more reliable ChatGPT-based solutions.
arXiv Detail & Related papers (2023-10-20T00:37:28Z) - An empirical study of ChatGPT-3.5 on question answering and code
maintenance [14.028497274245227]
A rising concern is whether ChatGPT will replace programmers and kill jobs.
We conducted an empirical study to systematically compare ChatGPT against programmers in question-answering and software-maintaining.
arXiv Detail & Related papers (2023-10-03T14:48:32Z) - DevGPT: Studying Developer-ChatGPT Conversations [12.69439932665687]
This paper introduces DevGPT, a dataset curated to explore how software developers interact with ChatGPT.
The dataset encompasses 29,778 prompts and responses from ChatGPT, including 19,106 code snippets.
arXiv Detail & Related papers (2023-08-31T06:55:40Z) - ChatLog: Carefully Evaluating the Evolution of ChatGPT Across Time [54.18651663847874]
ChatGPT has achieved great success and can be considered to have acquired an infrastructural status.
Existing benchmarks encounter two challenges: (1) Disregard for periodical evaluation and (2) Lack of fine-grained features.
We construct ChatLog, an ever-updating dataset with large-scale records of diverse long-form ChatGPT responses for 21 NLP benchmarks from March, 2023 to now.
arXiv Detail & Related papers (2023-04-27T11:33:48Z) - Can ChatGPT Understand Too? A Comparative Study on ChatGPT and
Fine-tuned BERT [103.57103957631067]
ChatGPT has attracted great attention, as it can generate fluent and high-quality responses to human inquiries.
We evaluate ChatGPT's understanding ability by evaluating it on the most popular GLUE benchmark, and comparing it with 4 representative fine-tuned BERT-style models.
We find that: 1) ChatGPT falls short in handling paraphrase and similarity tasks; 2) ChatGPT outperforms all BERT models on inference tasks by a large margin; 3) ChatGPT achieves comparable performance compared with BERT on sentiment analysis and question answering tasks.
arXiv Detail & Related papers (2023-02-19T12:29:33Z) - A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on
Reasoning, Hallucination, and Interactivity [79.12003701981092]
We carry out an extensive technical evaluation of ChatGPT using 23 data sets covering 8 different common NLP application tasks.
We evaluate the multitask, multilingual and multi-modal aspects of ChatGPT based on these data sets and a newly designed multimodal dataset.
ChatGPT is 63.41% accurate on average in 10 different reasoning categories under logical reasoning, non-textual reasoning, and commonsense reasoning.
arXiv Detail & Related papers (2023-02-08T12:35:34Z) - Is ChatGPT a General-Purpose Natural Language Processing Task Solver? [113.22611481694825]
Large language models (LLMs) have demonstrated the ability to perform a variety of natural language processing (NLP) tasks zero-shot.
Recently, the debut of ChatGPT has drawn a great deal of attention from the natural language processing (NLP) community.
It is not yet known whether ChatGPT can serve as a generalist model that can perform many NLP tasks zero-shot.
arXiv Detail & Related papers (2023-02-08T09:44:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.