Related papers: Engagement in Code Review: Emotional, Behavioral, and Cognitive Dimensions in Peer vs. LLM Interactions

Engagement in Code Review: Emotional, Behavioral, and Cognitive Dimensions in Peer vs. LLM Interactions

URL: http://arxiv.org/abs/2512.05309v1
Date: Thu, 04 Dec 2025 23:09:24 GMT
Title: Engagement in Code Review: Emotional, Behavioral, and Cognitive Dimensions in Peer vs. LLM Interactions
Authors: Adam Alami, Nathan Cassee, Thiago Rocha Silva, Elda Paja, Neil A. Ernst,
Abstract summary: We study how software engineers engage in Large Language Model (LLM)-assisted code reviews compared to human peer-led reviews.<n>We identify self-regulation strategies that engineers use to regulate their emotions in response to negative feedback.<n>We show that LLM-assisted review redirects engagement from emotion management to cognitive load management.
Score: 4.311425473934521
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Code review is a socio-technical practice, yet how software engineers engage in Large Language Model (LLM)-assisted code reviews compared to human peer-led reviews is less understood. We report a two-phase qualitative study with 20 software engineers to understand this. In Phase I, participants exchanged peer reviews and were interviewed about their affective responses and engagement decisions. In Phase II, we introduced a new prompt matching engineers' preferences and probed how characteristics shaped their reactions. We develop an integrative account linking emotional self-regulation to behavioral engagement and resolution. We identify self-regulation strategies that engineers use to regulate their emotions in response to negative feedback: reframing, dialogic regulation, avoidance, and defensiveness. Engagement proceeds through social calibration; engineers align their responses and behaviors to the relational climate and team norms. Trajectories to resolution, in the case of peer-led review, vary by locus (solo/dyad/team) and an internal sense-making process. With the LLM-assisted review, emotional costs and the need for self-regulation seem lower. When LLM feedback aligned with engineers' cognitive expectations, participants reported reduced processing effort and a potentially higher tendency to adopt. We show that LLM-assisted review redirects engagement from emotion management to cognitive load management. We contribute an integrative model of engagement that links emotional self-regulation to behavioral engagement and resolution, showing how affective and cognitive processes influence feedback adoption in peer-led and LLM-assisted code reviews. We conclude that AI is best positioned as a supportive partner to reduce cognitive and emotional load while preserving human accountability and the social meaning of peer review and similar socio-technical activities.

Related papers

Conversation for Non-verifiable Learning: Self-Evolving LLMs through Meta-Evaluation [56.84819098277464]
CoNL is a framework that unifies generation, evaluation, and meta-evaluation through multi-agent self-play.<n>CoNL achieves consistent improvements over self-rewarding baselines while maintaining stable training.
arXiv Detail & Related papers (2026-01-29T09:41:14Z)
Think Socially via Cognitive Reasoning [94.60442643943696]
We introduce Cognitive Reasoning, a paradigm modeled on human social cognition.<n>CogFlow is a complete framework that instills this capability in LLMs.
arXiv Detail & Related papers (2025-09-26T16:27:29Z)
SocialEval: Evaluating Social Intelligence of Large Language Models [70.90981021629021]
Social Intelligence (SI) equips humans with interpersonal abilities to behave wisely in navigating social interactions to achieve social goals.<n>This presents an operational evaluation paradigm: outcome-oriented goal achievement evaluation and process-oriented interpersonal ability evaluation.<n>We propose SocialEval, a script-based bilingual SI benchmark, integrating outcome- and process-oriented evaluation by manually crafting narrative scripts.
arXiv Detail & Related papers (2025-06-01T08:36:51Z)
Sentient Agent as a Judge: Evaluating Higher-Order Social Cognition in Large Language Models [75.85319609088354]
Sentient Agent as a Judge (SAGE) is an evaluation framework for large language models.<n>SAGE instantiates a Sentient Agent that simulates human-like emotional changes and inner thoughts during interaction.<n>SAGE provides a principled, scalable and interpretable tool for tracking progress toward genuinely empathetic and socially adept language agents.
arXiv Detail & Related papers (2025-05-01T19:06:10Z)
Intent-Aware Self-Correction for Mitigating Social Biases in Large Language Models [38.1620443730172]
Self-Correction based on feedback improves the output quality of Large Language Models (LLMs)<n>In this study, we demonstrate that clarifying intentions is essential for effectively reducing biases in LLMs through Self-Correction.
arXiv Detail & Related papers (2025-03-08T02:20:43Z)
Accountability in Code Review: The Role of Intrinsic Drivers and the Impact of LLMs [6.841710924733614]
Key intrinsic drivers of accountability for code quality are personal standards, professional integrity, pride in code quality, and maintaining one's reputation.<n> introduction of AI into software engineering must preserve social integrity and collective accountability mechanisms.
arXiv Detail & Related papers (2025-02-21T21:52:29Z)
Human and Machine: How Software Engineers Perceive and Engage with AI-Assisted Code Reviews Compared to Their Peers [4.734450431444635]
We investigate how software engineers perceive and engage with Large Language Model (LLM)-assisted code reviews.<n>We found that engagement in code review is multi-dimensional, spanning cognitive, emotional, and behavioral dimensions.<n>Our findings contribute to a deeper understanding of how AI tools are impacting SE socio-technical processes.
arXiv Detail & Related papers (2025-01-03T20:42:51Z)
Understanding the Dark Side of LLMs' Intrinsic Self-Correction [58.12627172032851]
Intrinsic self-correction was proposed to improve LLMs' responses via feedback prompts solely based on their inherent capability.<n>Recent works show that LLMs' intrinsic self-correction fails without oracle labels as feedback prompts.<n>We identify intrinsic self-correction can cause LLMs to waver both intermedia and final answers and lead to prompt bias on simple factual questions.
arXiv Detail & Related papers (2024-12-19T15:39:31Z)
Recognizing Emotion Regulation Strategies from Human Behavior with Large Language Models [44.015651538470856]
Human emotions are often not expressed directly, but regulated according to internal processes and social display rules. No method to automatically classify different emotion regulation strategies in a cross-user scenario exists. We make use of the recently introduced textscDeep corpus for modeling the social display of the emotion shame. A fine-tuned Llama2-7B model is able to classify the utilized emotion regulation strategy with high accuracy.
arXiv Detail & Related papers (2024-08-08T12:47:10Z)
Rel-A.I.: An Interaction-Centered Approach To Measuring Human-LM Reliance [73.19687314438133]
We study how reliance is affected by contextual features of an interaction. We find that contextual characteristics significantly affect human reliance behavior. Our results show that calibration and language quality alone are insufficient in evaluating the risks of human-LM interactions.
arXiv Detail & Related papers (2024-07-10T18:00:05Z)
Large Language Models are Capable of Offering Cognitive Reappraisal, if Guided [38.11184388388781]
Large language models (LLMs) have offered new opportunities for emotional support. This work takes a first step by engaging with cognitive reappraisals. We conduct a first-of-its-kind expert evaluation of an LLM's zero-shot ability to generate cognitive reappraisal responses.
arXiv Detail & Related papers (2024-04-01T17:56:30Z)
Emotionally Numb or Empathetic? Evaluating How LLMs Feel Using EmotionBench [83.41621219298489]
We evaluate Large Language Models' (LLMs) anthropomorphic capabilities using the emotion appraisal theory from psychology. We collect a dataset containing over 400 situations that have proven effective in eliciting the eight emotions central to our study. We conduct a human evaluation involving more than 1,200 subjects worldwide.
arXiv Detail & Related papers (2023-08-07T15:18:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.