Related papers: "Always Nice and Confident, Sometimes Wrong": Developer's Experiences Engaging Large Language Models (LLMs) Versus Human-Powered Q&A Platforms for Coding Support

"Always Nice and Confident, Sometimes Wrong": Developer's Experiences Engaging Large Language Models (LLMs) Versus Human-Powered Q&A Platforms for Coding Support

URL: http://arxiv.org/abs/2309.13684v3
Date: Thu, 20 Feb 2025 20:54:03 GMT
Title: "Always Nice and Confident, Sometimes Wrong": Developer's Experiences Engaging Large Language Models (LLMs) Versus Human-Powered Q&A Platforms for Coding Support
Authors: Jiachen Li, Elizabeth Mynatt, Varun Mishra, Jonathan Bell,
Abstract summary: With the rise of generative AI, developers have started to adopt AI chatbots, such as ChatGPT, in their software development process.<n>We investigate and compare how developers integrate this assistance into their real-world coding experiences by conducting a thematic analysis of 1700+ Reddit posts.<n>Our findings suggest that ChatGPT offers fast, clear, comprehensive responses and fosters a more respectful environment than SO.
Score: 10.028644951955886
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Software engineers have historically relied on human-powered Q&A platforms like Stack Overflow (SO) as coding aids. With the rise of generative AI, developers have started to adopt AI chatbots, such as ChatGPT, in their software development process. Recognizing the potential parallels between human-powered Q&A platforms and AI-powered question-based chatbots, we investigate and compare how developers integrate this assistance into their real-world coding experiences by conducting a thematic analysis of 1700+ Reddit posts. Through a comparative study of SO and ChatGPT, we identified each platform's strengths, use cases, and barriers. Our findings suggest that ChatGPT offers fast, clear, comprehensive responses and fosters a more respectful environment than SO. However, concerns about ChatGPT's reliability stem from its overly confident tone and the absence of validation mechanisms like SO's voting system. Based on these findings, we synthesized the design implications for future GenAI code assistants and recommend a workflow leveraging each platform's unique features to improve developer experiences.

Related papers

SafeChat: A Framework for Building Trustworthy Collaborative Assistants and a Case Study of its Usefulness [4.896226014796392]
We introduce SafeChat, a general architecture for building safe and trustworthy chatbots. Key features of SafeChat include: (a) safety, with a domain-agnostic design where responses are grounded and traceable to approved sources (provenance); (b) usability, with automatic extractive summarization of long responses, traceable to their sources; and (c) fast, scalable development, including a CSV-driven workflow, automated testing, and integration with various devices.
arXiv Detail & Related papers (2025-04-08T19:16:43Z)
Great Power Brings Great Responsibility: Personalizing Conversational AI for Diverse Problem-Solvers [10.472707414720341]
Large Language Models (LLMs) have emerged as potential resources for answering questions and providing guidance. LLMs may carry biases in presenting information, which can be especially impactful for newcomers whose problem-solving styles may not be broadly represented. This vision paper outlines the potential of adapting AI responses to various problem-solving styles to avoid privileging a particular subgroup.
arXiv Detail & Related papers (2025-02-11T18:46:01Z)
An exploratory analysis of Community-based Question-Answering Platforms and GPT-3-driven Generative AI: Is it the end of online community-based learning? [0.6749750044497732]
ChatGPT offers software engineers an interactive alternative to community question-answering platforms like Stack Overflow. We analyze 2564 Python and JavaScript questions from StackOverflow that were asked between January 2022 and December 2022. Our analysis indicates that ChatGPT's responses are 66% shorter and share 35% more words with the questions, showing a 25% increase in positive sentiment compared to human responses.
arXiv Detail & Related papers (2024-09-26T02:17:30Z)
OpenHands: An Open Platform for AI Software Developers as Generalist Agents [109.8507367518992]
We introduce OpenHands, a platform for the development of AI agents that interact with the world in similar ways to a human developer. We describe how the platform allows for the implementation of new agents, safe interaction with sandboxed environments for code execution, and incorporation of evaluation benchmarks.
arXiv Detail & Related papers (2024-07-23T17:50:43Z)
Impact of the Availability of ChatGPT on Software Development: A Synthetic Difference in Differences Estimation using GitHub Data [49.1574468325115]
ChatGPT is an AI tool that enhances software production efficiency. We estimate ChatGPT's effects on the number of git pushes, repositories, and unique developers per 100,000 people. These results suggest that AI tools like ChatGPT can substantially boost developer productivity, though further analysis is needed to address potential downsides such as low quality code and privacy concerns.
arXiv Detail & Related papers (2024-06-16T19:11:15Z)
Rocks Coding, Not Development--A Human-Centric, Experimental Evaluation of LLM-Supported SE Tasks [9.455579863269714]
We examined whether and to what degree working with ChatGPT was helpful in the coding task and typical software development task. We found that while ChatGPT performed well in solving simple coding problems, its performance in supporting typical software development tasks was not that good. Our study thus provides first-hand insights into using ChatGPT to fulfill software engineering tasks with real-world developers.
arXiv Detail & Related papers (2024-02-08T13:07:31Z)
Can You Follow Me? Testing Situational Understanding in ChatGPT [17.52769657390388]
"situational understanding" (SU) is a critical ability for human-like AI agents. We propose a novel synthetic environment for SU testing in chat-oriented models. We find that despite the fundamental simplicity of the task, the model's performance reflects an inability to retain correct environment states.
arXiv Detail & Related papers (2023-10-24T19:22:01Z)
Evaluating Chatbots to Promote Users' Trust -- Practices and Open Problems [11.427175278545517]
This paper reviews current practices for testing chatbots. It identifies gaps as open problems in pursuit of user trust. It outlines a path forward to mitigate issues of trust related to service or product performance, user satisfaction and long-term unintended consequences for society.
arXiv Detail & Related papers (2023-09-09T22:40:30Z)
ChatDev: Communicative Agents for Software Development [84.90400377131962]
ChatDev is a chat-powered software development framework in which specialized agents are guided in what to communicate. These agents actively contribute to the design, coding, and testing phases through unified language-based communication.
arXiv Detail & Related papers (2023-07-16T02:11:34Z)
Adding guardrails to advanced chatbots [5.203329540700177]
Launch of ChatGPT in November 2022 has ushered in a new era of AI. There are already concerns that humans may be replaced by chatbots for a variety of jobs. These biases may cause significant harm and/or inequity toward different subpopulations.
arXiv Detail & Related papers (2023-06-13T02:23:04Z)
Comparing Software Developers with ChatGPT: An Empirical Investigation [0.0]
This paper conducts an empirical investigation, contrasting the performance of software engineers and AI systems, like ChatGPT, across different evaluation metrics. The paper posits that a comprehensive comparison of software engineers and AI-based solutions, considering various evaluation criteria, is pivotal in fostering human-machine collaboration.
arXiv Detail & Related papers (2023-05-19T17:25:54Z)
To ChatGPT, or not to ChatGPT: That is the question! [78.407861566006]
This study provides a comprehensive and contemporary assessment of the most recent techniques in ChatGPT detection. We have curated a benchmark dataset consisting of prompts from ChatGPT and humans, including diverse questions from medical, open Q&A, and finance domains. Our evaluation results demonstrate that none of the existing methods can effectively detect ChatGPT-generated content.
arXiv Detail & Related papers (2023-04-04T03:04:28Z)
A Complete Survey on Generative AI (AIGC): Is ChatGPT from GPT-4 to GPT-5 All You Need? [112.12974778019304]
generative AI (AIGC, a.k.a AI-generated content) has made headlines everywhere because of its ability to analyze and create text, images, and beyond. In the era of AI transitioning from pure analysis to creation, it is worth noting that ChatGPT, with its most recent language model GPT-4, is just a tool out of numerous AIGC tasks. This work focuses on the technological development of various AIGC tasks based on their output type, including text, images, videos, 3D content, etc.
arXiv Detail & Related papers (2023-03-21T10:09:47Z)
A Categorical Archive of ChatGPT Failures [47.64219291655723]
ChatGPT, developed by OpenAI, has been trained using massive amounts of data and simulates human conversation. It has garnered significant attention due to its ability to effectively answer a broad range of human inquiries. However, a comprehensive analysis of ChatGPT's failures is lacking, which is the focus of this study.
arXiv Detail & Related papers (2023-02-06T04:21:59Z)
AI Based Chatbot: An Approach of Utilizing On Customer Service Assistance [0.0]
The project aims to develop the system that could comply with complex questions and logical output answers. The ultimate goal is to give high-quality results (answers) based on user input (question)
arXiv Detail & Related papers (2022-06-18T00:59:10Z)
Put Chatbot into Its Interlocutor's Shoes: New Framework to Learn Chatbot Responding with Intention [55.77218465471519]
This paper proposes an innovative framework to train chatbots to possess human-like intentions. Our framework included a guiding robot and an interlocutor model that plays the role of humans. We examined our framework using three experimental setups and evaluate the guiding robot with four different metrics to demonstrated flexibility and performance advantages.
arXiv Detail & Related papers (2021-03-30T15:24:37Z)
CASS: Towards Building a Social-Support Chatbot for Online Health Community [67.45813419121603]
The CASS architecture is based on advanced neural network algorithms. It can handle new inputs from users and generate a variety of responses to them. With a follow-up field experiment, CASS is proven useful in supporting individual members who seek emotional support.
arXiv Detail & Related papers (2021-01-04T05:52:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.