Related papers: Beyond Code Generation: An Observational Study of ChatGPT Usage in Software Engineering Practice

Beyond Code Generation: An Observational Study of ChatGPT Usage in Software Engineering Practice

URL: http://arxiv.org/abs/2404.14901v2
Date: Tue, 21 May 2024 12:53:30 GMT
Title: Beyond Code Generation: An Observational Study of ChatGPT Usage in Software Engineering Practice
Authors: Ranim Khojah, Mazen Mohamad, Philipp Leitner, Francisco Gomes de Oliveira Neto,
Abstract summary: We conduct an observational study of 24 professional software engineers who have been using ChatGPT over a period of one week in their jobs. We find that, rather than expecting ChatGPT to generate ready-to-use software artifacts (e.g., code), practitioners more often use ChatGPT to receive guidance on how to solve their tasks or learn about a topic in more abstract terms.
Score: 3.072802875195726
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) are frequently discussed in academia and the general public as support tools for virtually any use case that relies on the production of text, including software engineering. Currently there is much debate, but little empirical evidence, regarding the practical usefulness of LLM-based tools such as ChatGPT for engineers in industry. We conduct an observational study of 24 professional software engineers who have been using ChatGPT over a period of one week in their jobs, and qualitatively analyse their dialogues with the chatbot as well as their overall experience (as captured by an exit survey). We find that, rather than expecting ChatGPT to generate ready-to-use software artifacts (e.g., code), practitioners more often use ChatGPT to receive guidance on how to solve their tasks or learn about a topic in more abstract terms. We also propose a theoretical framework for how (i) purpose of the interaction, (ii) internal factors (e.g., the user's personality), and (iii) external factors (e.g., company policy) together shape the experience (in terms of perceived usefulness and trust). We envision that our framework can be used by future research to further the academic discussion on LLM usage by software engineering practitioners, and to serve as a reference point for the design of future empirical LLM research in this domain.

Related papers

Beyond the Hype: A Cautionary Tale of ChatGPT in the Programming Classroom [0.0]
The paper provides insights for academics who teach programming to create more challenging exercises and how to engage responsibly in the use of ChatGPT to promote classroom integrity. We analyzed the various practical programming examples from past IS exercises and compared those with memos created by tutors and lecturers in a university setting.
arXiv Detail & Related papers (2024-06-16T23:52:37Z)
Can ChatGPT Support Developers? An Empirical Evaluation of Large Language Models for Code Generation [2.93322471069531]
We conduct an empirical analysis of conversations in DevGPT, a dataset collected from developers' conversations with ChatGPT. Our findings indicate that the current practice of using LLM-generated code is typically limited to either demonstrating high-level concepts or providing examples in documentation.
arXiv Detail & Related papers (2024-02-18T20:48:09Z)
Language Models as Science Tutors [79.73256703631492]
We introduce TutorEval and TutorChat to measure real-life usability of LMs as scientific assistants. We show that fine-tuning base models with existing dialogue datasets leads to poor performance on TutorEval. We use TutorChat to fine-tune Llemma models with 7B and 34B parameters. These LM tutors specialized in math have a 32K-token context window, and they excel at TutorEval while performing strongly on GSM8K and MATH.
arXiv Detail & Related papers (2024-02-16T22:24:13Z)
Rocks Coding, Not Development--A Human-Centric, Experimental Evaluation of LLM-Supported SE Tasks [9.455579863269714]
We examined whether and to what degree working with ChatGPT was helpful in the coding task and typical software development task. We found that while ChatGPT performed well in solving simple coding problems, its performance in supporting typical software development tasks was not that good. Our study thus provides first-hand insights into using ChatGPT to fulfill software engineering tasks with real-world developers.
arXiv Detail & Related papers (2024-02-08T13:07:31Z)
LM-Polygraph: Uncertainty Estimation for Language Models [71.21409522341482]
Uncertainty estimation (UE) methods are one path to safer, more responsible, and more effective use of large language models (LLMs) We introduce LM-Polygraph, a framework with implementations of a battery of state-of-the-art UE methods for LLMs in text generation tasks, with unified program interfaces in Python. It introduces an extendable benchmark for consistent evaluation of UE techniques by researchers, and a demo web application that enriches the standard chat dialog with confidence scores.
arXiv Detail & Related papers (2023-11-13T15:08:59Z)
The Shifted and The Overlooked: A Task-oriented Investigation of User-GPT Interactions [114.67699010359637]
We analyze a large-scale collection of real user queries to GPT. We find that tasks such as design'' and planning'' are prevalent in user interactions but are largely neglected or different from traditional NLP benchmarks.
arXiv Detail & Related papers (2023-10-19T02:12:17Z)
Analysis of ChatGPT on Source Code [1.3381749415517021]
This paper explores the use of Large Language Models (LLMs) and in particular ChatGPT in programming, source code analysis, and code generation. LLMs and ChatGPT are built using machine learning and artificial intelligence techniques, and they offer several benefits to developers and programmers.
arXiv Detail & Related papers (2023-06-01T12:12:59Z)
Is ChatGPT the Ultimate Programming Assistant -- How far is it? [11.943927095071105]
ChatGPT has received great attention: it can be used as a bot for discussing source code. We present an empirical study of ChatGPT's potential as a fully automated programming assistant.
arXiv Detail & Related papers (2023-04-24T09:20:13Z)
LLM-based Interaction for Content Generation: A Case Study on the Perception of Employees in an IT department [85.1523466539595]
This paper presents a questionnaire survey to identify the intention to use generative tools by employees of an IT company. Our results indicate a rather average acceptability of generative tools, although the more useful the tool is perceived to be, the higher the intention seems to be. Our analyses suggest that the frequency of use of generative tools is likely to be a key factor in understanding how employees perceive these tools in the context of their work.
arXiv Detail & Related papers (2023-04-18T15:35:43Z)
ChatGPT Beyond English: Towards a Comprehensive Evaluation of Large Language Models in Multilingual Learning [70.57126720079971]
Large language models (LLMs) have emerged as the most important breakthroughs in natural language processing (NLP) This paper evaluates ChatGPT on 7 different tasks, covering 37 diverse languages with high, medium, low, and extremely low resources. Compared to the performance of previous models, our extensive experimental results demonstrate a worse performance of ChatGPT for different NLP tasks and languages.
arXiv Detail & Related papers (2023-04-12T05:08:52Z)
A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity [79.12003701981092]
We carry out an extensive technical evaluation of ChatGPT using 23 data sets covering 8 different common NLP application tasks. We evaluate the multitask, multilingual and multi-modal aspects of ChatGPT based on these data sets and a newly designed multimodal dataset. ChatGPT is 63.41% accurate on average in 10 different reasoning categories under logical reasoning, non-textual reasoning, and commonsense reasoning.
arXiv Detail & Related papers (2023-02-08T12:35:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.