ChatGPT vs LLaMA: Impact, Reliability, and Challenges in Stack Overflow
Discussions
- URL: http://arxiv.org/abs/2402.08801v1
- Date: Tue, 13 Feb 2024 21:15:33 GMT
- Title: ChatGPT vs LLaMA: Impact, Reliability, and Challenges in Stack Overflow
Discussions
- Authors: Leuson Da Silva and Jordan Samhi and Foutse Khomh
- Abstract summary: ChatGPT has shaken up Stack Overflow, the premier platform for developers' queries on programming and software development.
Two months after ChatGPT's release, Meta released its answer with its own Large Language Model (LLM) called LLaMA: the race was on.
- Score: 13.7001994656622
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Since its release in November 2022, ChatGPT has shaken up Stack Overflow, the
premier platform for developers' queries on programming and software
development. Demonstrating an ability to generate instant, human-like responses
to technical questions, ChatGPT has ignited debates within the developer
community about the evolving role of human-driven platforms in the age of
generative AI. Two months after ChatGPT's release, Meta released its answer
with its own Large Language Model (LLM) called LLaMA: the race was on. We
conducted an empirical study analyzing questions from Stack Overflow and using
these LLMs to address them. This way, we aim to (ii) measure user engagement
evolution with Stack Overflow over time; (ii) quantify the reliability of LLMs'
answers and their potential to replace Stack Overflow in the long term; (iii)
identify and understand why LLMs fails; and (iv) compare LLMs together. Our
empirical results are unequivocal: ChatGPT and LLaMA challenge human expertise,
yet do not outperform it for some domains, while a significant decline in user
posting activity has been observed. Furthermore, we also discuss the impact of
our findings regarding the usage and development of new LLMs.
Related papers
- An exploratory analysis of Community-based Question-Answering Platforms and GPT-3-driven Generative AI: Is it the end of online community-based learning? [0.6749750044497732]
ChatGPT offers software engineers an interactive alternative to community question-answering platforms like Stack Overflow.
We analyze 2564 Python and JavaScript questions from StackOverflow that were asked between January 2022 and December 2022.
Our analysis indicates that ChatGPT's responses are 66% shorter and share 35% more words with the questions, showing a 25% increase in positive sentiment compared to human responses.
arXiv Detail & Related papers (2024-09-26T02:17:30Z) - An Empirical Study on Challenges for LLM Developers [28.69628251749012]
We crawl and analyze 29,057 relevant questions from a popular OpenAI developer forum.
After manually analyzing 2,364 sampled questions, we construct a taxonomy of challenges faced by LLM developers.
arXiv Detail & Related papers (2024-08-06T05:46:28Z) - StackRAG Agent: Improving Developer Answers with Retrieval-Augmented Generation [2.225268436173329]
StackRAG is a retrieval-augmented Multiagent generation tool based on Large Language Models.
It combines the two worlds: aggregating the knowledge from SO to enhance the reliability of the generated answers.
Initial evaluations show that the generated answers are correct, accurate, relevant, and useful.
arXiv Detail & Related papers (2024-06-19T21:07:35Z) - When LLMs Meet Cunning Texts: A Fallacy Understanding Benchmark for Large Language Models [59.84769254832941]
We propose a FaLlacy Understanding Benchmark (FLUB) containing cunning texts that are easy for humans to understand but difficult for models to grasp.
Specifically, the cunning texts that FLUB focuses on mainly consist of the tricky, humorous, and misleading texts collected from the real internet environment.
Based on FLUB, we investigate the performance of multiple representative and advanced LLMs.
arXiv Detail & Related papers (2024-02-16T22:12:53Z) - Large Language Models: A Survey [69.72787936480394]
Large Language Models (LLMs) have drawn a lot of attention due to their strong performance on a wide range of natural language tasks.
LLMs' ability of general-purpose language understanding and generation is acquired by training billions of model's parameters on massive amounts of text data.
arXiv Detail & Related papers (2024-02-09T05:37:09Z) - ChatGPT's One-year Anniversary: Are Open-Source Large Language Models
Catching up? [71.12709925152784]
ChatGPT has brought a seismic shift in the entire landscape of AI.
It showed that a model could answer human questions and follow instructions on a broad panel of tasks.
While closed-source LLMs generally outperform their open-source counterparts, the progress on the latter has been rapid.
This has crucial implications not only on research but also on business.
arXiv Detail & Related papers (2023-11-28T17:44:51Z) - LM-Polygraph: Uncertainty Estimation for Language Models [71.21409522341482]
Uncertainty estimation (UE) methods are one path to safer, more responsible, and more effective use of large language models (LLMs)
We introduce LM-Polygraph, a framework with implementations of a battery of state-of-the-art UE methods for LLMs in text generation tasks, with unified program interfaces in Python.
It introduces an extendable benchmark for consistent evaluation of UE techniques by researchers, and a demo web application that enriches the standard chat dialog with confidence scores.
arXiv Detail & Related papers (2023-11-13T15:08:59Z) - Investigating Answerability of LLMs for Long-Form Question Answering [35.41413072729483]
We focus on long-form question answering (LFQA) because it has several practical and impactful applications.
We propose a question-generation method from abstractive summaries and show that generating follow-up questions from summaries of long documents can create a challenging setting.
arXiv Detail & Related papers (2023-09-15T07:22:56Z) - From Mundane to Meaningful: AI's Influence on Work Dynamics -- evidence
from ChatGPT and Stack Overflow [0.0]
We explore how ChatGPT changed a fundamental aspect of coding: problem-solving.
We exploit the effect of the sudden release of ChatGPT on the 30th of November 2022 on the usage of the largest online community for coders: Stack Overflow.
arXiv Detail & Related papers (2023-08-22T09:30:02Z) - Check Your Facts and Try Again: Improving Large Language Models with
External Knowledge and Automated Feedback [127.75419038610455]
Large language models (LLMs) are able to generate human-like, fluent responses for many downstream tasks.
This paper proposes a LLM-Augmenter system, which augments a black-box LLM with a set of plug-and-play modules.
arXiv Detail & Related papers (2023-02-24T18:48:43Z) - A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on
Reasoning, Hallucination, and Interactivity [79.12003701981092]
We carry out an extensive technical evaluation of ChatGPT using 23 data sets covering 8 different common NLP application tasks.
We evaluate the multitask, multilingual and multi-modal aspects of ChatGPT based on these data sets and a newly designed multimodal dataset.
ChatGPT is 63.41% accurate on average in 10 different reasoning categories under logical reasoning, non-textual reasoning, and commonsense reasoning.
arXiv Detail & Related papers (2023-02-08T12:35:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.