Related papers: Which is a better programming assistant? A comparative study between chatgpt and stack overflow

Which is a better programming assistant? A comparative study between chatgpt and stack overflow

URL: http://arxiv.org/abs/2308.13851v1
Date: Sat, 26 Aug 2023 11:25:18 GMT
Title: Which is a better programming assistant? A comparative study between chatgpt and stack overflow
Authors: Jinrun Liu, Xinyu Tang, Linlin Li, Panpan Chen, Yepang Liu
Abstract summary: We compare the performance of Stack Overflow and ChatGPT in enhancing programmer productivity. Concerning code quality, ChatGPT outperforms Stack Overflow significantly in helping complete algorithmic and library-related tasks. We identify the reasons behind the two platforms' divergent performances in programming assistance.
Score: 10.861651344753591
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Programmers often seek help from Q\&A websites to resolve issues they encounter during programming. Stack Overflow has been a widely used platform for this purpose for over a decade. Recently, revolutionary AI-powered platforms like ChatGPT have quickly gained popularity among programmers for their efficient and personalized programming assistance via natural language interactions. Both platforms can offer valuable assistance to programmers, but it's unclear which is more effective at enhancing programmer productivity. In our paper, we conducted an exploratory user study to compare the performance of Stack Overflow and ChatGPT in enhancing programmer productivity. Two groups of students with similar programming abilities were instructed to use the two platforms to solve three different types of programming tasks: algorithmic challenges, library usage, and debugging. During the experiments, we measured and compared the quality of code produced and the time taken to complete tasks for the two groups. The results show that, concerning code quality, ChatGPT outperforms Stack Overflow significantly in helping complete algorithmic and library-related tasks, while Stack Overflow is better for debugging tasks. Regarding task completion speed, the ChatGPT group is obviously faster than the Stack Overflow group in the algorithmic challenge, but the two groups have a similar performance in the other two tasks. Additionally, we conducted a post-experiment survey with the participants to understand how the platforms have helped them complete the programming tasks. We analyzed the questionnaires to summarize ChatGPT and Stack Overflow's strengths and weaknesses pointed out by the participants. By comparing these, we identified the reasons behind the two platforms' divergent performances in programming assistance.

Related papers

Do AI models help produce verified bug fixes? [62.985237003585674]
Large Language Models are used to produce corrections to software bugs.<n>This paper investigates how programmers use Large Language Models to complement their own skills.<n>The results are a first step towards a proper role for AI and LLMs in providing guaranteed-correct fixes to program bugs.
arXiv Detail & Related papers (2025-07-21T17:30:16Z)
SwingArena: Competitive Programming Arena for Long-context GitHub Issue Solving [90.32201622392137]
We present SwingArena, a competitive evaluation framework for Large Language Models (LLMs)<n>Unlike traditional static benchmarks, SwingArena models the collaborative process of software by pairing LLMs as iterations, who generate patches, and reviewers, who create test cases and verify the patches through continuous integration (CI) pipelines.
arXiv Detail & Related papers (2025-05-29T18:28:02Z)
CPRet: A Dataset, Benchmark, and Model for Retrieval in Competitive Programming [56.17331530444765]
CPRet is a retrieval-oriented benchmark suite for competitive programming.<n>It covers four retrieval tasks: two code-centric (i.e., Text-to-Code and Code-to-Code) and two newly proposed problem-centric tasks (i.e., Problem-to-Duplicate and Simplified-to-Full)<n>Our contribution includes both high-quality training data and temporally separated test sets for reliable evaluation.
arXiv Detail & Related papers (2025-05-19T10:07:51Z)
Providing Information About Implemented Algorithms Improves Program Comprehension: A Controlled Experiment [46.198289193451146]
Annotating source code with algorithm labels significantly improves program comprehension. A majority of participants perceived the labels as helpful, especially for recognizing the codes intent. Reasons for self-implementing algorithms included library inadequacies, performance needs and avoiding dependencies or licensing costs.
arXiv Detail & Related papers (2025-04-27T13:08:30Z)
Hints Help Finding and Fixing Bugs Differently in Python and Text-based Program Representations [28.829745991874816]
We find that the program representation has a significant influence on the users' accuracy at finding and fixing bugs. Different hints help differently depending on the program representation and the user's understanding of the algorithmic task. These findings have implications for designing next-generation programming tools that provide personalized support to users.
arXiv Detail & Related papers (2024-12-17T02:11:53Z)
FullStack Bench: Evaluating LLMs as Full Stack Coders [108.63536080569877]
FullStack Bench focuses on full-stack programming, which encompasses a wide range of application domains. To assess multilingual programming capabilities, in FullStack Bench, we design real-world instructions and corresponding unit test cases from 16 widely-used programming languages.
arXiv Detail & Related papers (2024-11-30T16:58:42Z)
Amplifying human performance in combinatorial competitive programming [41.59043428241635]
We focus on competitive programming, where the target is to find as-good-as-possible solutions to intractable problems. We deploy our approach on previous iterations of Hash Code, a global team programming competition inspired by NP-hard software engineering problems at Google. Our solutions significantly improve the attained scores from their baseline, successfully breaking into the top percentile on all previous Hash Code online qualification rounds.
arXiv Detail & Related papers (2024-11-29T14:40:36Z)
Benchmarking ChatGPT, Codeium, and GitHub Copilot: A Comparative Study of AI-Driven Programming and Debugging Assistants [0.0]
Large language models (LLMs) have become essential for tasks like code generation, bug fixing, and optimization. This paper presents a comparative study of ChatGPT, Codeium, and GitHub Copilot, evaluating their performance on LeetCode problems.
arXiv Detail & Related papers (2024-09-30T03:53:40Z)
Benchmarking Predictive Coding Networks -- Made Simple [48.652114040426625]
We tackle the problems of efficiency and scalability for predictive coding networks (PCNs) in machine learning. We propose a library, called PCX, that focuses on performance and simplicity, and use it to implement a large set of standard benchmarks. We perform extensive tests on such benchmarks using both existing algorithms for PCNs, as well as adaptations of other methods popular in the bio-plausible deep learning community.
arXiv Detail & Related papers (2024-07-01T10:33:44Z)
Automatic Bi-modal Question Title Generation for Stack Overflow with Prompt Learning [10.76882347665857]
An initial study aimed to automatically generate the titles by only analyzing the code snippets in the question body. We propose an approach SOTitle+ by considering bi-modal information (i.e., the code snippets and the problem descriptions) in the question body. Our corpus includes 179,119 high-quality question posts for six popular programming languages.
arXiv Detail & Related papers (2024-03-06T12:58:25Z)
Unmasking the giant: A comprehensive evaluation of ChatGPT's proficiency in coding algorithms and data structures [0.6990493129893112]
We evaluate ChatGPT's ability to generate correct solutions to the problems fed to it, its code quality, and nature of run-time errors thrown by its code. We look into patterns in the test cases passed in order to gain some insights into how wrong ChatGPT code is in these kinds of situations.
arXiv Detail & Related papers (2023-07-10T08:20:34Z)
Is ChatGPT the Ultimate Programming Assistant -- How far is it? [11.943927095071105]
ChatGPT has received great attention: it can be used as a bot for discussing source code. We present an empirical study of ChatGPT's potential as a fully automated programming assistant.
arXiv Detail & Related papers (2023-04-24T09:20:13Z)
Exploring the Use of ChatGPT as a Tool for Learning and Assessment in Undergraduate Computer Science Curriculum: Opportunities and Challenges [0.3553493344868413]
This paper addresses the prospects and obstacles associated with utilizing ChatGPT as a tool for learning and assessment in undergraduate Computer Science curriculum. Group B students were given access to ChatGPT and were encouraged to use it to help solve the programming challenges. Results show that students using ChatGPT had an advantage in terms of earned scores, however there were inconsistencies and inaccuracies in the submitted code.
arXiv Detail & Related papers (2023-04-16T21:04:52Z)
A Gold Standard Dataset for the Reviewer Assignment Problem [117.59690218507565]
"Similarity score" is a numerical estimate of the expertise of a reviewer in reviewing a paper. Our dataset consists of 477 self-reported expertise scores provided by 58 researchers. For the task of ordering two papers in terms of their relevance for a reviewer, the error rates range from 12%-30% in easy cases to 36%-43% in hard cases.
arXiv Detail & Related papers (2023-03-23T16:15:03Z)
Is ChatGPT a General-Purpose Natural Language Processing Task Solver? [113.22611481694825]
Large language models (LLMs) have demonstrated the ability to perform a variety of natural language processing (NLP) tasks zero-shot. Recently, the debut of ChatGPT has drawn a great deal of attention from the natural language processing (NLP) community. It is not yet known whether ChatGPT can serve as a generalist model that can perform many NLP tasks zero-shot.
arXiv Detail & Related papers (2023-02-08T09:44:51Z)
Planning for Sample Efficient Imitation Learning [52.44953015011569]
Current imitation algorithms struggle to achieve high performance and high in-environment sample efficiency simultaneously. We propose EfficientImitate, a planning-based imitation learning method that can achieve high in-environment sample efficiency and performance simultaneously. Experimental results show that EI achieves state-of-the-art results in performance and sample efficiency.
arXiv Detail & Related papers (2022-10-18T05:19:26Z)
Competition-Level Code Generation with AlphaCode [74.87216298566942]
We introduce AlphaCode, a system for code generation that can create novel solutions to problems that require deeper reasoning. In simulated evaluations on recent programming competitions on the Codeforces platform, AlphaCode achieved on average a ranking of top 54.3%.
arXiv Detail & Related papers (2022-02-08T23:16:31Z)
ProtoTransformer: A Meta-Learning Approach to Providing Student Feedback [54.142719510638614]
In this paper, we frame the problem of providing feedback as few-shot classification. A meta-learner adapts to give feedback to student code on a new programming question from just a few examples by instructors. Our approach was successfully deployed to deliver feedback to 16,000 student exam-solutions in a programming course offered by a tier 1 university.
arXiv Detail & Related papers (2021-07-23T22:41:28Z)
Retrieve, Program, Repeat: Complex Knowledge Base Question Answering via Alternate Meta-learning [56.771557756836906]
We present a novel method that automatically learns a retrieval model alternately with the programmer from weak supervision. Our system leads to state-of-the-art performance on a large-scale task for complex question answering over knowledge bases.
arXiv Detail & Related papers (2020-10-29T18:28:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.