Unveiling ChatGPT's Usage in Open Source Projects: A Mining-based Study
- URL: http://arxiv.org/abs/2402.16480v1
- Date: Mon, 26 Feb 2024 10:58:51 GMT
- Title: Unveiling ChatGPT's Usage in Open Source Projects: A Mining-based Study
- Authors: Rosalia Tufano, Antonio Mastropaolo, Federica Pepe, Ozren Dabi\'c,
Massimiliano Di Penta, Gabriele Bavota
- Abstract summary: Large Language Models (LLMs) have gained significant attention in the software engineering community.
We mine 1,501 commits, pull requests, and issues from open-source projects by matching regular expressions likely to indicate the usage of ChatGPT to accomplish the task.
This resulted in a taxonomy of 45 tasks which developers automate via ChatGPT.
- Score: 17.952085678503362
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) have gained significant attention in the
software engineering community. Nowadays developers have the possibility to
exploit these models through industrial-grade tools providing a handy interface
toward LLMs, such as OpenAI's ChatGPT. While the potential of LLMs in assisting
developers across several tasks has been documented in the literature, there is
a lack of empirical evidence mapping the actual usage of LLMs in software
projects. In this work, we aim at filling such a gap. First, we mine 1,501
commits, pull requests (PRs), and issues from open-source projects by matching
regular expressions likely to indicate the usage of ChatGPT to accomplish the
task. Then, we manually analyze these instances, discarding false positives
(i.e., instances in which ChatGPT was mentioned but not actually used) and
categorizing the task automated in the 467 true positive instances (165
commits, 159 PRs, 143 issues). This resulted in a taxonomy of 45 tasks which
developers automate via ChatGPT. The taxonomy, accompanied with representative
examples, provides (i) developers with valuable insights on how to exploit LLMs
in their workflow and (ii) researchers with a clear overview of tasks that,
according to developers, could benefit from automated solutions.
Related papers
- LLMs are Imperfect, Then What? An Empirical Study on LLM Failures in Software Engineering [38.20696656193963]
We conducted an observational study with 22 participants using ChatGPT as a coding assistant in a non-trivial software engineering task.
We identified the cases where ChatGPT failed, their root causes, and the corresponding mitigation solutions used by users.
arXiv Detail & Related papers (2024-11-15T03:29:41Z) - WorkArena++: Towards Compositional Planning and Reasoning-based Common Knowledge Work Tasks [85.95607119635102]
Large language models (LLMs) can mimic human-like intelligence.
WorkArena++ is designed to evaluate the planning, problem-solving, logical/arithmetic reasoning, retrieval, and contextual understanding abilities of web agents.
arXiv Detail & Related papers (2024-07-07T07:15:49Z) - BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions [72.56339136017759]
We introduce BigCodeBench, a benchmark that challenges Large Language Models (LLMs) to invoke multiple function calls as tools from 139 libraries and 7 domains for 1,140 fine-grained tasks.
Our evaluation shows that LLMs are not yet capable of following complex instructions to use function calls precisely, with scores up to 60%, significantly lower than the human performance of 97%.
We propose a natural-language-oriented variant of BigCodeBench, BigCodeBench-Instruct, that automatically transforms the original docstrings into short instructions only with essential information.
arXiv Detail & Related papers (2024-06-22T15:52:04Z) - Are you still on track!? Catching LLM Task Drift with Activations [55.75645403965326]
Task drift allows attackers to exfiltrate data or influence the LLM's output for other users.
We show that a simple linear classifier can detect drift with near-perfect ROC AUC on an out-of-distribution test set.
We observe that this approach generalizes surprisingly well to unseen task domains, such as prompt injections, jailbreaks, and malicious instructions.
arXiv Detail & Related papers (2024-06-02T16:53:21Z) - Beyond Code Generation: An Observational Study of ChatGPT Usage in Software Engineering Practice [3.072802875195726]
We conduct an observational study of 24 professional software engineers who have been using ChatGPT over a period of one week in their jobs.
We find that, rather than expecting ChatGPT to generate ready-to-use software artifacts (e.g., code), practitioners more often use ChatGPT to receive guidance on how to solve their tasks or learn about a topic in more abstract terms.
arXiv Detail & Related papers (2024-04-23T10:34:16Z) - Rocks Coding, Not Development--A Human-Centric, Experimental Evaluation
of LLM-Supported SE Tasks [9.455579863269714]
We examined whether and to what degree working with ChatGPT was helpful in the coding task and typical software development task.
We found that while ChatGPT performed well in solving simple coding problems, its performance in supporting typical software development tasks was not that good.
Our study thus provides first-hand insights into using ChatGPT to fulfill software engineering tasks with real-world developers.
arXiv Detail & Related papers (2024-02-08T13:07:31Z) - ML-Bench: Evaluating Large Language Models and Agents for Machine Learning Tasks on Repository-Level Code [76.84199699772903]
ML-Bench is a benchmark rooted in real-world programming applications that leverage existing code repositories to perform tasks.
To evaluate both Large Language Models (LLMs) and AI agents, two setups are employed: ML-LLM-Bench for assessing LLMs' text-to-code conversion within a predefined deployment environment, and ML-Agent-Bench for testing autonomous agents in an end-to-end task execution within a Linux sandbox environment.
arXiv Detail & Related papers (2023-11-16T12:03:21Z) - The Shifted and The Overlooked: A Task-oriented Investigation of
User-GPT Interactions [114.67699010359637]
We analyze a large-scale collection of real user queries to GPT.
We find that tasks such as design'' and planning'' are prevalent in user interactions but are largely neglected or different from traditional NLP benchmarks.
arXiv Detail & Related papers (2023-10-19T02:12:17Z) - A Critical Review of Large Language Model on Software Engineering: An Example from ChatGPT and Automated Program Repair [19.123640635549524]
Large Language Models (LLMs) have been gaining increasing attention and demonstrated promising performance across a variety of software engineering tasks.
This paper reviews the bug-fixing capabilities of ChatGPT on a clean APR benchmark with different research objectives.
ChatGPT is able to fix 109 out of 151 buggy programs using the basic prompt within 35 independent rounds, outperforming state-of-the-art LLMs CodeT5 and PLBART by 27.5% and 62.4% prediction accuracy.
arXiv Detail & Related papers (2023-10-13T06:11:47Z) - Is ChatGPT the Ultimate Programming Assistant -- How far is it? [11.943927095071105]
ChatGPT has received great attention: it can be used as a bot for discussing source code.
We present an empirical study of ChatGPT's potential as a fully automated programming assistant.
arXiv Detail & Related papers (2023-04-24T09:20:13Z) - Check Your Facts and Try Again: Improving Large Language Models with
External Knowledge and Automated Feedback [127.75419038610455]
Large language models (LLMs) are able to generate human-like, fluent responses for many downstream tasks.
This paper proposes a LLM-Augmenter system, which augments a black-box LLM with a set of plug-and-play modules.
arXiv Detail & Related papers (2023-02-24T18:48:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.