Rocks Coding, Not Development--A Human-Centric, Experimental Evaluation
of LLM-Supported SE Tasks
- URL: http://arxiv.org/abs/2402.05650v3
- Date: Wed, 21 Feb 2024 08:16:34 GMT
- Title: Rocks Coding, Not Development--A Human-Centric, Experimental Evaluation
of LLM-Supported SE Tasks
- Authors: Wei Wang, Huilong Ning, Gaowei Zhang, Libo Liu and Yi Wang
- Abstract summary: We examined whether and to what degree working with ChatGPT was helpful in the coding task and typical software development task.
We found that while ChatGPT performed well in solving simple coding problems, its performance in supporting typical software development tasks was not that good.
Our study thus provides first-hand insights into using ChatGPT to fulfill software engineering tasks with real-world developers.
- Score: 9.455579863269714
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, large language models (LLM) based generative AI has been gaining
momentum for their impressive high-quality performances in multiple domains,
particularly after the release of the ChatGPT. Many believe that they have the
potential to perform general-purpose problem-solving in software development
and replace human software developers. Nevertheless, there are in a lack of
serious investigation into the capability of these LLM techniques in fulfilling
software development tasks. In a controlled 2 x 2 between-subject experiment
with 109 participants, we examined whether and to what degree working with
ChatGPT was helpful in the coding task and typical software development task
and how people work with ChatGPT. We found that while ChatGPT performed well in
solving simple coding problems, its performance in supporting typical software
development tasks was not that good. We also observed the interactions between
participants and ChatGPT and found the relations between the interactions and
the outcomes. Our study thus provides first-hand insights into using ChatGPT to
fulfill software engineering tasks with real-world developers and motivates the
need for novel interaction mechanisms that help developers effectively work
with large language models to achieve desired outcomes.
Related papers
- Lingma SWE-GPT: An Open Development-Process-Centric Language Model for Automated Software Improvement [62.94719119451089]
Lingma SWE-GPT series learns from and simulating real-world code submission activities.
Lingma SWE-GPT 72B resolves 30.20% of GitHub issues, marking a significant improvement in automatic issue resolution.
arXiv Detail & Related papers (2024-11-01T14:27:16Z) - Developers' Perceptions on the Impact of ChatGPT in Software Development: A Survey [13.257222195239375]
We conducted a survey with 207 software developers to understand the impact of ChatGPT on software quality, productivity, and job satisfaction.
The study delves into developers' expectations regarding future adaptations of ChatGPT, concerns about potential job displacement, and perspectives on regulatory interventions.
arXiv Detail & Related papers (2024-05-20T17:31:16Z) - Investigating the Utility of ChatGPT in the Issue Tracking System: An
Exploratory Study [5.176434782905268]
This study examines the interaction between ChatGPT and developers to analyze their prevalent activities and provide a resolution.
Our investigation reveals that developers mainly use ChatGPT for brainstorming solutions but often opt to write their code instead of using ChatGPT-generated code.
arXiv Detail & Related papers (2024-02-06T06:03:05Z) - Experiential Co-Learning of Software-Developing Agents [83.34027623428096]
Large language models (LLMs) have brought significant changes to various domains, especially in software development.
We introduce Experiential Co-Learning, a novel LLM-agent learning framework.
Experiments demonstrate that the framework enables agents to tackle unseen software-developing tasks more effectively.
arXiv Detail & Related papers (2023-12-28T13:50:42Z) - ChatGPT as a Software Development Bot: A Project-based Study [5.518217604591736]
This study examines the impact of generative AI tools, specifically ChatGPT, on the software development experiences of undergraduate students.
Results showed that ChatGPT significantly addresses skill gaps in software development education, enhancing efficiency, accuracy, and collaboration.
arXiv Detail & Related papers (2023-10-20T16:48:19Z) - DevGPT: Studying Developer-ChatGPT Conversations [12.69439932665687]
This paper introduces DevGPT, a dataset curated to explore how software developers interact with ChatGPT.
The dataset encompasses 29,778 prompts and responses from ChatGPT, including 19,106 code snippets.
arXiv Detail & Related papers (2023-08-31T06:55:40Z) - ChatDev: Communicative Agents for Software Development [84.90400377131962]
ChatDev is a chat-powered software development framework in which specialized agents are guided in what to communicate.
These agents actively contribute to the design, coding, and testing phases through unified language-based communication.
arXiv Detail & Related papers (2023-07-16T02:11:34Z) - Is ChatGPT the Ultimate Programming Assistant -- How far is it? [11.943927095071105]
ChatGPT has received great attention: it can be used as a bot for discussing source code.
We present an empirical study of ChatGPT's potential as a fully automated programming assistant.
arXiv Detail & Related papers (2023-04-24T09:20:13Z) - ChatGPT Beyond English: Towards a Comprehensive Evaluation of Large
Language Models in Multilingual Learning [70.57126720079971]
Large language models (LLMs) have emerged as the most important breakthroughs in natural language processing (NLP)
This paper evaluates ChatGPT on 7 different tasks, covering 37 diverse languages with high, medium, low, and extremely low resources.
Compared to the performance of previous models, our extensive experimental results demonstrate a worse performance of ChatGPT for different NLP tasks and languages.
arXiv Detail & Related papers (2023-04-12T05:08:52Z) - A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on
Reasoning, Hallucination, and Interactivity [79.12003701981092]
We carry out an extensive technical evaluation of ChatGPT using 23 data sets covering 8 different common NLP application tasks.
We evaluate the multitask, multilingual and multi-modal aspects of ChatGPT based on these data sets and a newly designed multimodal dataset.
ChatGPT is 63.41% accurate on average in 10 different reasoning categories under logical reasoning, non-textual reasoning, and commonsense reasoning.
arXiv Detail & Related papers (2023-02-08T12:35:34Z) - Is ChatGPT a General-Purpose Natural Language Processing Task Solver? [113.22611481694825]
Large language models (LLMs) have demonstrated the ability to perform a variety of natural language processing (NLP) tasks zero-shot.
Recently, the debut of ChatGPT has drawn a great deal of attention from the natural language processing (NLP) community.
It is not yet known whether ChatGPT can serve as a generalist model that can perform many NLP tasks zero-shot.
arXiv Detail & Related papers (2023-02-08T09:44:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.