Related papers: PatchTrack: A Comprehensive Analysis of ChatGPT's Influence on Pull Request Outcomes

PatchTrack: A Comprehensive Analysis of ChatGPT's Influence on Pull Request Outcomes

URL: http://arxiv.org/abs/2505.07700v2
Date: Tue, 26 Aug 2025 19:59:22 GMT
Title: PatchTrack: A Comprehensive Analysis of ChatGPT's Influence on Pull Request Outcomes
Authors: Daniel Ogenrwot, John Businge,
Abstract summary: We analyze pull requests from 255 GitHub repositories containing self-admitted ChatGPT usage.<n>We introduce PatchTrack, a tool that classifies whether ChatGPT patches were applied, not applied, or not suggested.<n>A qualitative analysis of 89 pull requests with integrated patches revealed recurring patterns of structural integration, selective extraction, and iterative refinement.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The rapid adoption of large language models (LLMs) like ChatGPT has introduced new dynamics in software development, particularly within pull request workflows. While prior research has examined the quality of AI-generated code, little is known about how developers actually use these suggestions in real-world collaboration. We analyze 338 pull requests from 255 GitHub repositories containing self-admitted ChatGPT usage, including 645 AI-generated snippets and 3,486 developer-authored patches. We introduce PatchTrack, a tool that classifies whether ChatGPT patches were applied, not applied, or not suggested, enabling fine-grained analysis of AI-assisted decisions. Full adoption of ChatGPT code is rare: the median integration rate was 25%. A qualitative analysis of 89 pull requests with integrated patches revealed recurring patterns of structural integration, selective extraction, and iterative refinement, showing that developers typically treat ChatGPT's output as a starting point rather than a final implementation. Even when code was not directly adopted, ChatGPT influenced workflows through conceptual guidance, documentation, and debugging strategies. Integration decisions were shaped by scope, architectural fit, contributor role, and review norms. This study offers empirical insight into how generative AI is used in collaborative software development, showing that its impact extends beyond patch generation to broader decision-making. Our findings inform the design of AI-assisted tools, clarify patch adoption behavior, and support more transparent and effective use of LLMs in practice.

Related papers

CoCoNUTS: Concentrating on Content while Neglecting Uninformative Textual Styles for AI-Generated Peer Review Detection [60.52240468810558]
We introduce CoCoNUTS, a content-oriented benchmark built upon a fine-grained dataset of AI-generated peer reviews.<n>We also develop CoCoDet, an AI review detector via a multi-task learning framework, to achieve more accurate and robust detection of AI involvement in review content.
arXiv Detail & Related papers (2025-08-28T06:03:11Z)
SwingArena: Competitive Programming Arena for Long-context GitHub Issue Solving [90.32201622392137]
We present SwingArena, a competitive evaluation framework for Large Language Models (LLMs)<n>Unlike traditional static benchmarks, SwingArena models the collaborative process of software by pairing LLMs as iterations, who generate patches, and reviewers, who create test cases and verify the patches through continuous integration (CI) pipelines.
arXiv Detail & Related papers (2025-05-29T18:28:02Z)
Unveiling the Role of ChatGPT in Software Development: Insights from Developer-ChatGPT Interactions on GitHub [13.658091361380333]
generative AI tools like ChatGPT are gaining widespread adoption among developers.<n>ChatGPT's potential has been extensively discussed, but there is limited empirical evidence exploring its real-world usage by developers.<n>This study bridges the gap by conducting a large-scale empirical analysis of ChatGPT-assisted development activities.
arXiv Detail & Related papers (2025-05-06T18:16:08Z)
Why Do Developers Engage with ChatGPT in Issue-Tracker? Investigating Usage and Reliance on ChatGPT-Generated Code [4.605779671279481]
We analyzed 1,152 Developer-ChatGPT conversations across 1,012 issues in GitHub.<n>ChatGPT is primarily utilized for ideation, whereas its usage for validation is minimal.<n>ChatGPT-generated code was used as-is to resolve only 5.83% of the issues.
arXiv Detail & Related papers (2024-12-09T18:47:31Z)
Impact of the Availability of ChatGPT on Software Development: A Synthetic Difference in Differences Estimation using GitHub Data [49.1574468325115]
ChatGPT is an AI tool that enhances software production efficiency. We estimate ChatGPT's effects on the number of git pushes, repositories, and unique developers per 100,000 people. These results suggest that AI tools like ChatGPT can substantially boost developer productivity, though further analysis is needed to address potential downsides such as low quality code and privacy concerns.
arXiv Detail & Related papers (2024-06-16T19:11:15Z)
Investigating the Utility of ChatGPT in the Issue Tracking System: An Exploratory Study [5.176434782905268]
This study examines the interaction between ChatGPT and developers to analyze their prevalent activities and provide a resolution. Our investigation reveals that developers mainly use ChatGPT for brainstorming solutions but often opt to write their code instead of using ChatGPT-generated code.
arXiv Detail & Related papers (2024-02-06T06:03:05Z)
Exploring ChatGPT's Capabilities on Vulnerability Management [56.4403395100589]
We explore ChatGPT's capabilities on 6 tasks involving the complete vulnerability management process with a large-scale dataset containing 70,346 samples. One notable example is ChatGPT's proficiency in tasks like generating titles for software bug reports. Our findings reveal the difficulties encountered by ChatGPT and shed light on promising future directions.
arXiv Detail & Related papers (2023-11-11T11:01:13Z)
DevGPT: Studying Developer-ChatGPT Conversations [12.69439932665687]
This paper introduces DevGPT, a dataset curated to explore how software developers interact with ChatGPT. The dataset encompasses 29,778 prompts and responses from ChatGPT, including 19,106 code snippets.
arXiv Detail & Related papers (2023-08-31T06:55:40Z)
Extending the Frontier of ChatGPT: Code Generation and Debugging [0.0]
ChatGPT, developed by OpenAI, has ushered in a new era by utilizing artificial intelligence (AI) to tackle diverse problem domains. This research paper delves into the efficacy of ChatGPT in solving programming problems, examining both the correctness and the efficiency of its solution in terms of time and memory complexity. The research reveals a commendable overall success rate of 71.875%, denoting the proportion of problems for which ChatGPT was able to provide correct solutions.
arXiv Detail & Related papers (2023-07-17T06:06:58Z)
One Small Step for Generative AI, One Giant Leap for AGI: A Complete Survey on ChatGPT in AIGC Era [95.2284704286191]
GPT-4 (a.k.a. ChatGPT plus) is one small step for generative AI (GAI) but one giant leap for artificial general intelligence (AGI) Since its official release in November 2022, ChatGPT has quickly attracted numerous users with extensive media coverage. This work is the first to survey ChatGPT with a comprehensive review of its underlying technology, applications, and challenges.
arXiv Detail & Related papers (2023-04-04T06:22:09Z)
To ChatGPT, or not to ChatGPT: That is the question! [78.407861566006]
This study provides a comprehensive and contemporary assessment of the most recent techniques in ChatGPT detection. We have curated a benchmark dataset consisting of prompts from ChatGPT and humans, including diverse questions from medical, open Q&A, and finance domains. Our evaluation results demonstrate that none of the existing methods can effectively detect ChatGPT-generated content.
arXiv Detail & Related papers (2023-04-04T03:04:28Z)
Does Synthetic Data Generation of LLMs Help Clinical Text Mining? [51.205078179427645]
We investigate the potential of OpenAI's ChatGPT to aid in clinical text mining. We propose a new training paradigm that involves generating a vast quantity of high-quality synthetic data. Our method has resulted in significant improvements in the performance of downstream tasks.
arXiv Detail & Related papers (2023-03-08T03:56:31Z)
A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity [79.12003701981092]
We carry out an extensive technical evaluation of ChatGPT using 23 data sets covering 8 different common NLP application tasks. We evaluate the multitask, multilingual and multi-modal aspects of ChatGPT based on these data sets and a newly designed multimodal dataset. ChatGPT is 63.41% accurate on average in 10 different reasoning categories under logical reasoning, non-textual reasoning, and commonsense reasoning.
arXiv Detail & Related papers (2023-02-08T12:35:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.