Show Your Title! A Scoping Review on Verbalization in Software Engineering with LLM-Assisted Screening
- URL: http://arxiv.org/abs/2510.12294v1
- Date: Tue, 14 Oct 2025 08:56:16 GMT
- Title: Show Your Title! A Scoping Review on Verbalization in Software Engineering with LLM-Assisted Screening
- Authors: Gergő Balogh, Dávid Kószó, Homayoun Safarpour Motealegh Mahalegi, László Tóth, Bence Szakács, Áron Búcsú,
- Abstract summary: This paper presents a scoping review of research at the intersection of software engineering (SE) and psychology (PSY)<n>To make large-scale interdisciplinary reviews feasible, we employed a large language model (LLM)-assisted screening pipeline using GPT.<n>We validated GPT's outputs against human reviewers and found high consistency, with a 13% disagreement rate.<n>Prominent themes mainly were tied to the craft of SE, while more human-centered topics were underrepresented.
- Score: 1.217622452761334
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Understanding how software developers think, make decisions, and behave remains a key challenge in software engineering (SE). Verbalization techniques (methods that capture spoken or written thought processes) offer a lightweight and accessible way to study these cognitive aspects. This paper presents a scoping review of research at the intersection of SE and psychology (PSY), focusing on the use of verbal data. To make large-scale interdisciplinary reviews feasible, we employed a large language model (LLM)-assisted screening pipeline using GPT to assess the relevance of over 9,000 papers based solely on titles. We addressed two questions: what themes emerge from verbalization-related work in SE, and how effective are LLMs in supporting interdisciplinary review processes? We validated GPT's outputs against human reviewers and found high consistency, with a 13\% disagreement rate. Prominent themes mainly were tied to the craft of SE, while more human-centered topics were underrepresented. The data also suggests that SE frequently draws on PSY methods, whereas the reverse is rare.
Related papers
- Word Clouds as Common Voices: LLM-Assisted Visualization of Participant-Weighted Themes in Qualitative Interviews [13.971616443394474]
We introduce ThemeClouds, an open-source visualization tool that generates thematic, participant-weighted word clouds from dialogue transcripts.<n>The system prompts an LLM to identify concept-level themes across a corpus and then counts how many unique participants mention each topic.<n>Using interviews from a user study comparing five recording-device configurations, our approach surfaces more actionable device concerns than frequency clouds and topic-modeling baselines.
arXiv Detail & Related papers (2025-08-11T00:27:52Z) - Psycholinguistic Analyses in Software Engineering Text: A Systematic Literature Review [9.229310642804036]
Linguistic Inquiry and Word Count (LIWC) offer clearer, interpretable insights into cognitive and emotional processes exhibited in text.<n>Despite its wide use in software engineering research, no comprehensive review of LIWC's use has been conducted.<n>We conducted a systematic review of six prominent databases, identifying 43 SE-related papers using LIWC.
arXiv Detail & Related papers (2025-03-08T00:23:13Z) - Streamlining the review process: AI-generated annotations in research manuscripts [0.5735035463793009]
This study explores the potential of integrating Large Language Models (LLMs) into the peer-review process to enhance efficiency without compromising effectiveness.<n>We focus on manuscript annotations, particularly excerpt highlights, as a potential area for AI-human collaboration.<n>This paper introduces AnnotateGPT, a platform that utilizes GPT-4 for manuscript review, aiming to improve reviewers' comprehension and focus.
arXiv Detail & Related papers (2024-11-29T23:26:34Z) - The Dual-Edged Sword of Technical Debt: Benefits and Issues Analyzed Through Developer Discussions [8.304493605883744]
Technical debt (TD) has long been one of the key factors influencing the maintainability of software products.
This work is to collectively investigate the practitioners' opinions on the various perspectives of TD from a large collection of articles.
arXiv Detail & Related papers (2024-07-30T17:54:36Z) - Categorical Syllogisms Revisited: A Review of the Logical Reasoning Abilities of LLMs for Analyzing Categorical Syllogism [62.571419297164645]
This paper provides a systematic overview of prior works on the logical reasoning ability of large language models for analyzing categorical syllogisms.<n>We first investigate all the possible variations for the categorical syllogisms from a purely logical perspective.<n>We then examine the underlying configurations (i.e., mood and figure) tested by the existing datasets.
arXiv Detail & Related papers (2024-06-26T21:17:20Z) - LLMs Assist NLP Researchers: Critique Paper (Meta-)Reviewing [106.45895712717612]
Large language models (LLMs) have shown remarkable versatility in various generative tasks.
This study focuses on the topic of LLMs assist NLP Researchers.
To our knowledge, this is the first work to provide such a comprehensive analysis.
arXiv Detail & Related papers (2024-06-24T01:30:22Z) - What Can Natural Language Processing Do for Peer Review? [173.8912784451817]
In modern science, peer review is widely used, yet it is hard, time-consuming, and prone to error.
Since the artifacts involved in peer review are largely text-based, Natural Language Processing has great potential to improve reviewing.
We detail each step of the process from manuscript submission to camera-ready revision, and discuss the associated challenges and opportunities for NLP assistance.
arXiv Detail & Related papers (2024-05-10T16:06:43Z) - Using Generative Text Models to Create Qualitative Codebooks for Student Evaluations of Teaching [0.0]
Student evaluations of teaching (SETs) are important sources of feedback for educators.
A collection of SETs can also be useful to administrators as signals for courses or entire programs.
We discuss a novel method for analyzing SETs using natural language processing (NLP) and large language models (LLMs)
arXiv Detail & Related papers (2024-03-18T17:21:35Z) - Thread of Thought Unraveling Chaotic Contexts [133.24935874034782]
"Thread of Thought" (ThoT) strategy draws inspiration from human cognitive processes.
In experiments, ThoT significantly improves reasoning performance compared to other prompting techniques.
arXiv Detail & Related papers (2023-11-15T06:54:44Z) - How to Handle Different Types of Out-of-Distribution Scenarios in Computational Argumentation? A Comprehensive and Fine-Grained Field Study [59.13867562744973]
This work systematically assesses LMs' capabilities for out-of-distribution (OOD) scenarios.
We find that the efficacy of such learning paradigms varies with the type of OOD.
Specifically, while ICL excels for domain shifts, prompt-based fine-tuning surpasses for topic shifts.
arXiv Detail & Related papers (2023-09-15T11:15:47Z) - PolyphonicFormer: Unified Query Learning for Depth-aware Video Panoptic
Segmentation [90.26723865198348]
We present PolyphonicFormer, a vision transformer to unify all the sub-tasks under the DVPS task.
Our method explores the relationship between depth estimation and panoptic segmentation via query-based learning.
Our method ranks 1st on the ICCV-2021 BMTT Challenge video + depth track.
arXiv Detail & Related papers (2021-12-05T14:31:47Z) - Weakly-Supervised Aspect-Based Sentiment Analysis via Joint
Aspect-Sentiment Topic Embedding [71.2260967797055]
We propose a weakly-supervised approach for aspect-based sentiment analysis.
We learn sentiment, aspect> joint topic embeddings in the word embedding space.
We then use neural models to generalize the word-level discriminative information.
arXiv Detail & Related papers (2020-10-13T21:33:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.