Is Stack Overflow Obsolete? An Empirical Study of the Characteristics of
ChatGPT Answers to Stack Overflow Questions
- URL: http://arxiv.org/abs/2308.02312v4
- Date: Wed, 7 Feb 2024 22:28:28 GMT
- Title: Is Stack Overflow Obsolete? An Empirical Study of the Characteristics of
ChatGPT Answers to Stack Overflow Questions
- Authors: Samia Kabir, David N. Udo-Imeh, Bonan Kou, Tianyi Zhang
- Abstract summary: We conducted the first in-depth analysis of ChatGPT answers to programming questions on Stack Overflow.
We examined the correctness, consistency, comprehensiveness, and conciseness of ChatGPT answers.
Our analysis shows that 52% of ChatGPT answers contain incorrect information and 77% are verbose.
- Score: 7.065853028825656
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Q&A platforms have been crucial for the online help-seeking behavior of
programmers. However, the recent popularity of ChatGPT is altering this trend.
Despite this popularity, no comprehensive study has been conducted to evaluate
the characteristics of ChatGPT's answers to programming questions. To bridge
the gap, we conducted the first in-depth analysis of ChatGPT answers to 517
programming questions on Stack Overflow and examined the correctness,
consistency, comprehensiveness, and conciseness of ChatGPT answers.
Furthermore, we conducted a large-scale linguistic analysis, as well as a user
study, to understand the characteristics of ChatGPT answers from linguistic and
human aspects. Our analysis shows that 52% of ChatGPT answers contain incorrect
information and 77% are verbose. Nonetheless, our user study participants still
preferred ChatGPT answers 35% of the time due to their comprehensiveness and
well-articulated language style. However, they also overlooked the
misinformation in the ChatGPT answers 39% of the time. This implies the need to
counter misinformation in ChatGPT answers to programming questions and raise
awareness of the risks associated with seemingly correct answers.
Related papers
- An exploratory analysis of Community-based Question-Answering Platforms and GPT-3-driven Generative AI: Is it the end of online community-based learning? [0.6749750044497732]
ChatGPT offers software engineers an interactive alternative to community question-answering platforms like Stack Overflow.
We analyze 2564 Python and JavaScript questions from StackOverflow that were asked between January 2022 and December 2022.
Our analysis indicates that ChatGPT's responses are 66% shorter and share 35% more words with the questions, showing a 25% increase in positive sentiment compared to human responses.
arXiv Detail & Related papers (2024-09-26T02:17:30Z) - A Study on the Vulnerability of Test Questions against ChatGPT-based
Cheating [14.113742357609285]
ChatGPT can answer text prompts fairly accurately, even performing very well on postgraduate-level questions.
Many educators have found that their take-home or remote tests and exams are vulnerable to ChatGPT-based cheating.
arXiv Detail & Related papers (2024-02-21T23:51:06Z) - Exploring ChatGPT's Capabilities on Vulnerability Management [56.4403395100589]
We explore ChatGPT's capabilities on 6 tasks involving the complete vulnerability management process with a large-scale dataset containing 70,346 samples.
One notable example is ChatGPT's proficiency in tasks like generating titles for software bug reports.
Our findings reveal the difficulties encountered by ChatGPT and shed light on promising future directions.
arXiv Detail & Related papers (2023-11-11T11:01:13Z) - Primacy Effect of ChatGPT [69.49920102917598]
We study the primacy effect of ChatGPT: the tendency of selecting the labels at earlier positions as the answer.
We hope that our experiments and analyses provide additional insights into building more reliable ChatGPT-based solutions.
arXiv Detail & Related papers (2023-10-20T00:37:28Z) - An empirical study of ChatGPT-3.5 on question answering and code
maintenance [14.028497274245227]
A rising concern is whether ChatGPT will replace programmers and kill jobs.
We conducted an empirical study to systematically compare ChatGPT against programmers in question-answering and software-maintaining.
arXiv Detail & Related papers (2023-10-03T14:48:32Z) - Are We Ready to Embrace Generative AI for Software Q&A? [25.749110480727765]
Stack Overflow, the world's largest software Q&A (SQA) website, is facing a significant traffic drop due to the emergence of generative AI techniques.
ChatGPT is banned by Stack Overflow after only 6 days from its release.
To verify this, we conduct a comparative evaluation of human-written and ChatGPT-generated answers.
arXiv Detail & Related papers (2023-07-19T05:54:43Z) - Evaluating Privacy Questions From Stack Overflow: Can ChatGPT Compete? [1.231476564107544]
ChatGPT has been used as an alternative to generate code or produce responses to developers' questions.
Our results show that most privacy-related questions are related to choice/consent, aggregation, and identification.
arXiv Detail & Related papers (2023-06-19T21:33:04Z) - ChatLog: Carefully Evaluating the Evolution of ChatGPT Across Time [54.18651663847874]
ChatGPT has achieved great success and can be considered to have acquired an infrastructural status.
Existing benchmarks encounter two challenges: (1) Disregard for periodical evaluation and (2) Lack of fine-grained features.
We construct ChatLog, an ever-updating dataset with large-scale records of diverse long-form ChatGPT responses for 21 NLP benchmarks from March, 2023 to now.
arXiv Detail & Related papers (2023-04-27T11:33:48Z) - ChatGPT is a Knowledgeable but Inexperienced Solver: An Investigation of Commonsense Problem in Large Language Models [49.52083248451775]
Large language models (LLMs) have made significant progress in NLP.
We specifically focus on ChatGPT, a widely used and easily accessible LLM.
We conduct a series of experiments on 11 datasets to evaluate ChatGPT's commonsense abilities.
arXiv Detail & Related papers (2023-03-29T03:05:43Z) - Is ChatGPT a General-Purpose Natural Language Processing Task Solver? [113.22611481694825]
Large language models (LLMs) have demonstrated the ability to perform a variety of natural language processing (NLP) tasks zero-shot.
Recently, the debut of ChatGPT has drawn a great deal of attention from the natural language processing (NLP) community.
It is not yet known whether ChatGPT can serve as a generalist model that can perform many NLP tasks zero-shot.
arXiv Detail & Related papers (2023-02-08T09:44:51Z) - A Categorical Archive of ChatGPT Failures [47.64219291655723]
ChatGPT, developed by OpenAI, has been trained using massive amounts of data and simulates human conversation.
It has garnered significant attention due to its ability to effectively answer a broad range of human inquiries.
However, a comprehensive analysis of ChatGPT's failures is lacking, which is the focus of this study.
arXiv Detail & Related papers (2023-02-06T04:21:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.