Related papers: Human and Machine: How Software Engineers Perceive and Engage with AI-Assisted Code Reviews Compared to Their Peers

Human and Machine: How Software Engineers Perceive and Engage with AI-Assisted Code Reviews Compared to Their Peers

URL: http://arxiv.org/abs/2501.02092v1
Date: Fri, 03 Jan 2025 20:42:51 GMT
Title: Human and Machine: How Software Engineers Perceive and Engage with AI-Assisted Code Reviews Compared to Their Peers
Authors: Adam Alami, Neil A. Ernst,
Abstract summary: We investigate how software engineers perceive and engage with Large Language Model (LLM)-assisted code reviews.<n>We found that engagement in code review is multi-dimensional, spanning cognitive, emotional, and behavioral dimensions.<n>Our findings contribute to a deeper understanding of how AI tools are impacting SE socio-technical processes.
Score: 4.734450431444635
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: The integration of artificial intelligence (AI) continues to increase and evolve, including in software engineering (SE). This integration involves processes traditionally entrusted to humans, such as coding. However, the impact on socio-technical processes like code review remains underexplored. In this interview-based study (20 interviewees), we investigate how software engineers perceive and engage with Large Language Model (LLM)-assisted code reviews compared to human peer-led reviews. In this inherently human-centric process, we aim to understand how software engineers navigate the introduction of AI into collaborative workflows. We found that engagement in code review is multi-dimensional, spanning cognitive, emotional, and behavioral dimensions. The introduction of LLM-assisted review impacts some of these attributes. For example, there is less need for emotional regulation and coping mechanisms when dealing with an LLM compared to peers. However, the cognitive load sometimes is higher in dealing with LLM-generated feedback due to its excessive details. Software engineers use a similar sense-making process to evaluate and adopt feedback suggestions from their peers and the LLM. However, the LLM feedback adoption is constrained by trust and lack of context in the review. Our findings contribute to a deeper understanding of how AI tools are impacting SE socio-technical processes and provide insights into the future of AI-human collaboration in SE practices.

Related papers

Processes Matter: How ML/GAI Approaches Could Support Open Qualitative Coding of Online Discourse Datasets [39.96179530555875]
We compare open coding results by five recently published ML/GAI approaches and four human coders. Line-by-line AI approaches effectively identify content-based codes, while humans excel in interpreting conversational dynamics. Instead of replacing humans in open coding, researchers should integrate AI with and according to their analytical processes.
arXiv Detail & Related papers (2025-04-02T13:43:54Z)
Conversational AI as a Coding Assistant: Understanding Programmers' Interactions with and Expectations from Large Language Models for Coding [5.064404027153094]
Conversational AI interfaces powered by large language models (LLMs) are increasingly used as coding assistants. This study investigates programmers' usage patterns, perceptions, and interaction strategies when engaging with LLM-driven coding assistants.
arXiv Detail & Related papers (2025-03-14T15:06:07Z)
Analysis of Student-LLM Interaction in a Software Engineering Project [1.2233362977312945]
We analyze 126 undergraduate students' interaction with an AI assistant during a 13-week semester to understand the benefits of AI for software engineering learning. Our findings suggest that students prefer ChatGPT over CoPilot. conversational-based interaction helps improve the quality of the code generated compared to auto-generated code.
arXiv Detail & Related papers (2025-02-03T11:44:00Z)
LLMs are Imperfect, Then What? An Empirical Study on LLM Failures in Software Engineering [38.20696656193963]
We conducted an observational study with 22 participants using ChatGPT as a coding assistant in a non-trivial software engineering task. We identified the cases where ChatGPT failed, their root causes, and the corresponding mitigation solutions used by users.
arXiv Detail & Related papers (2024-11-15T03:29:41Z)
Can We Trust AI Agents? An Experimental Study Towards Trustworthy LLM-Based Multi-Agent Systems for AI Ethics [10.084913433923566]
This study examines how trustworthiness-enhancing techniques affect ethical AI output generation. We design the prototype LLM-BMAS, where agents engage in structured discussions on real-world ethical AI issues. Discussions reveal terms like bias detection, transparency, accountability, user consent, compliance, fairness evaluation, and EU AI Act compliance.
arXiv Detail & Related papers (2024-10-25T20:17:59Z)
CIBench: Evaluating Your LLMs with a Code Interpreter Plugin [68.95137938214862]
We propose an interactive evaluation framework, named CIBench, to comprehensively assess LLMs' ability to utilize code interpreters for data science tasks. The evaluation dataset is constructed using an LLM-human cooperative approach and simulates an authentic workflow by leveraging consecutive and interactive IPython sessions. We conduct extensive experiments to analyze the ability of 24 LLMs on CIBench and provide valuable insights for future LLMs in code interpreter utilization.
arXiv Detail & Related papers (2024-07-15T07:43:55Z)
Human-Modeling in Sequential Decision-Making: An Analysis through the Lens of Human-Aware AI [20.21053807133341]
We try to provide an account of what constitutes a human-aware AI system. We see that human-aware AI is a design oriented paradigm, one that focuses on the need for modeling the humans it may interact with.
arXiv Detail & Related papers (2024-05-13T14:17:52Z)
How Far Have We Gone in Binary Code Understanding Using Large Language Models [51.527805834378974]
We propose a benchmark to evaluate the effectiveness of Large Language Models (LLMs) in binary code understanding. Our evaluations reveal that existing LLMs can understand binary code to a certain extent, thereby improving the efficiency of binary code analysis.
arXiv Detail & Related papers (2024-04-15T14:44:08Z)
Emotionally Numb or Empathetic? Evaluating How LLMs Feel Using EmotionBench [83.41621219298489]
We evaluate Large Language Models' (LLMs) anthropomorphic capabilities using the emotion appraisal theory from psychology. We collect a dataset containing over 400 situations that have proven effective in eliciting the eight emotions central to our study. We conduct a human evaluation involving more than 1,200 subjects worldwide.
arXiv Detail & Related papers (2023-08-07T15:18:30Z)
How Can Recommender Systems Benefit from Large Language Models: A Survey [82.06729592294322]
Large language models (LLM) have shown impressive general intelligence and human-like capabilities. We conduct a comprehensive survey on this research direction from the perspective of the whole pipeline in real-world recommender systems.
arXiv Detail & Related papers (2023-06-09T11:31:50Z)
Evaluating Language Models for Mathematics through Interactions [116.67206980096513]
We introduce CheckMate, a prototype platform for humans to interact with and evaluate large language models (LLMs) We conduct a study with CheckMate to evaluate three language models (InstructGPT, ChatGPT, and GPT-4) as assistants in proving undergraduate-level mathematics. We derive a taxonomy of human behaviours and uncover that despite a generally positive correlation, there are notable instances of divergence between correctness and perceived helpfulness.
arXiv Detail & Related papers (2023-06-02T17:12:25Z)
Comparing Software Developers with ChatGPT: An Empirical Investigation [0.0]
This paper conducts an empirical investigation, contrasting the performance of software engineers and AI systems, like ChatGPT, across different evaluation metrics. The paper posits that a comprehensive comparison of software engineers and AI-based solutions, considering various evaluation criteria, is pivotal in fostering human-machine collaboration.
arXiv Detail & Related papers (2023-05-19T17:25:54Z)
OpenAGI: When LLM Meets Domain Experts [51.86179657467822]
Human Intelligence (HI) excels at combining basic skills to solve complex tasks. This capability is vital for Artificial Intelligence (AI) and should be embedded in comprehensive AI Agents. We introduce OpenAGI, an open-source platform designed for solving multi-step, real-world tasks.
arXiv Detail & Related papers (2023-04-10T03:55:35Z)
Machine Psychology [54.287802134327485]
We argue that a fruitful direction for research is engaging large language models in behavioral experiments inspired by psychology. We highlight theoretical perspectives, experimental paradigms, and computational analysis techniques that this approach brings to the table. It paves the way for a "machine psychology" for generative artificial intelligence (AI) that goes beyond performance benchmarks.
arXiv Detail & Related papers (2023-03-24T13:24:41Z)
Enabling Automated Machine Learning for Model-Driven AI Engineering [60.09869520679979]
We propose a novel approach to enable Model-Driven Software Engineering and Model-Driven AI Engineering. In particular, we support Automated ML, thus assisting software engineers without deep AI knowledge in developing AI-intensive systems.
arXiv Detail & Related papers (2022-03-06T10:12:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.