Related papers: An Investigation on How AI-Generated Responses Affect SoftwareEngineering Surveys

An Investigation on How AI-Generated Responses Affect SoftwareEngineering Surveys

URL: http://arxiv.org/abs/2512.17455v1
Date: Fri, 19 Dec 2025 11:17:05 GMT
Title: An Investigation on How AI-Generated Responses Affect SoftwareEngineering Surveys
Authors: Ronnie de Souza Santos, Italo Santos, Maria Teresa Baldassarre, Cleyton Magalhaes, Mairieli Wessel,
Abstract summary: This study explores how large language models (LLMs) are being misused in software engineering surveys.<n>We analyzed data from two survey deployments conducted in 2025 through the Prolific platform.<n>We identify data authenticity as an emerging dimension of validity in software engineering surveys.
Score: 3.183470571353323
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Survey research is a fundamental empirical method in software engineering, enabling the systematic collection of data on professional practices, perceptions, and experiences. However, recent advances in large language models (LLMs) have introduced new risks to survey integrity, as participants can use generative tools to fabricate or manipulate their responses. This study explores how LLMs are being misused in software engineering surveys and investigates the methodological implications of such behavior for data authenticity, validity, and research integrity. We collected data from two survey deployments conducted in 2025 through the Prolific platform and analyzed the content of participants' answers to identify irregular or falsified responses. A subset of responses suspected of being AI generated was examined through qualitative pattern inspection, narrative characterization, and automated detection using the Scribbr AI Detector. The analysis revealed recurring structural patterns in 49 survey responses indicating synthetic authorship, including repetitive sequencing, uniform phrasing, and superficial personalization. These false narratives mimicked coherent reasoning while concealing fabricated content, undermining construct, internal, and external validity. Our study identifies data authenticity as an emerging dimension of validity in software engineering surveys. We emphasize that reliable evidence now requires combining automated and interpretive verification procedures, transparent reporting, and community standards to detect and prevent AI generated responses, thereby protecting the credibility of surveys in software engineering.

Related papers

Leveraging LLM Parametric Knowledge for Fact Checking without Retrieval [60.25608870901428]
Trustworthiness is a core research challenge for agentic AI systems built on Large Language Models (LLMs)<n>We propose the task of fact-checking without retrieval, focusing on the verification of arbitrary natural language claims, independent of their source robustness.
arXiv Detail & Related papers (2026-03-05T18:42:51Z)
The Story is Not the Science: Execution-Grounded Evaluation of Mechanistic Interpretability Research [56.80927148740585]
We address the challenges of scalability and rigor by flipping the dynamic and developing AI agents as research evaluators.<n>We use mechanistic interpretability research as a testbed, build standardized research output, and develop MechEvalAgent.<n>Our work demonstrates the potential of AI agents to transform research evaluation and pave the way for rigorous scientific practices.
arXiv Detail & Related papers (2026-02-05T19:00:02Z)
DeepResearchEval: An Automated Framework for Deep Research Task Construction and Agentic Evaluation [56.886936435727854]
DeepResearchEval is an automated framework for deep research task construction and agentic evaluation.<n>For task construction, we propose a persona-driven pipeline generating realistic, complex research tasks anchored in diverse user profiles.<n>For evaluation, we propose an agentic pipeline with two components: an Adaptive Point-wise Quality Evaluation that dynamically derives task-specific evaluation dimensions, criteria, and weights conditioned on each generated task, and an Active Fact-Checking that autonomously extracts and verifies report statements via web search, even when citations are missing.
arXiv Detail & Related papers (2026-01-14T18:38:31Z)
From Noise to Insights: Enhancing Supply Chain Decision Support through AI-Based Survey Integrity Analytics [0.0]
This study proposes a lightweight AI-based framework for filtering unreliable survey inputs using a supervised machine learning approach.<n>After preprocessing and label encoding, both Random Forest and baseline models were trained to distinguish genuine from fake responses.<n>The best-performing model achieved an 92.0% accuracy rate, demonstrating improved detection compared to the pilot study.
arXiv Detail & Related papers (2026-01-14T05:23:50Z)
Large Language Models for Unit Test Generation: Achievements, Challenges, and the Road Ahead [15.43943391801509]
Unit testing is an essential yet laborious technique for verifying software.<n>Large Language Models (LLMs) address this limitation by utilizing by leveraging their data-driven knowledge of code semantics and programming patterns.<n>This framework analyzes the literature regarding core generative strategies and a set of enhancement techniques.
arXiv Detail & Related papers (2025-11-26T13:30:11Z)
AutoMalDesc: Large-Scale Script Analysis for Cyber Threat Research [81.04845910798387]
Generating natural language explanations for threat detections remains an open problem in cybersecurity research.<n>We present AutoMalDesc, an automated static analysis summarization framework that operates independently at scale.<n>We publish our complete dataset of more than 100K script samples, including annotated seed (0.9K) datasets, along with our methodology and evaluation framework.
arXiv Detail & Related papers (2025-11-17T13:05:25Z)
Identity Theft in AI Conference Peer Review [50.18240135317708]
We discuss newly uncovered cases of identity theft in the scientific peer-review process within artificial intelligence (AI) research.<n>We detail how dishonest researchers exploit the peer-review system by creating fraudulent reviewer profiles to manipulate paper evaluations.
arXiv Detail & Related papers (2025-08-06T02:36:52Z)
A validity-guided workflow for robust large language model research in psychology [0.0]
Large language models (LLMs) are rapidly being integrated into psychological research as research tools, evaluation targets, human simulators, and cognitive models.<n>These "measurement phantoms"--statistical artifacts masquerading as psychological phenomena--threaten the validity of a growing body of research.<n>Guided by the dual-validity framework that integrates psychometrics with causal inference, we present a six-stage workflow that scales validity requirements to research ambition.
arXiv Detail & Related papers (2025-07-06T18:06:12Z)
InfoDeepSeek: Benchmarking Agentic Information Seeking for Retrieval-Augmented Generation [63.55258191625131]
InfoDeepSeek is a new benchmark for assessing agentic information seeking in real-world, dynamic web environments.<n>We propose a systematic methodology for constructing challenging queries satisfying the criteria of determinacy, difficulty, and diversity.<n>We develop the first evaluation framework tailored to dynamic agentic information seeking, including fine-grained metrics about the accuracy, utility, and compactness of information seeking outcomes.
arXiv Detail & Related papers (2025-05-21T14:44:40Z)
Methodological Foundations for AI-Driven Survey Question Generation [41.94295877935867]
This paper presents a methodological framework for using generative AI in educational survey research.<n>We explore how Large Language Models can generate adaptive, context-aware survey questions.<n>We examine ethical issues such as bias, privacy, and transparency.
arXiv Detail & Related papers (2025-05-02T09:50:34Z)
AutoSurvey: Large Language Models Can Automatically Write Surveys [77.0458309675818]
This paper introduces AutoSurvey, a speedy and well-organized methodology for automating the creation of comprehensive literature surveys. Traditional survey paper creation faces challenges due to the vast volume and complexity of information. Our contributions include a comprehensive solution to the survey problem, a reliable evaluation method, and experimental validation demonstrating AutoSurvey's effectiveness.
arXiv Detail & Related papers (2024-06-10T12:56:06Z)
SoK: Machine Learning for Misinformation Detection [0.8057006406834466]
We examine the disconnect between scholarship and practice in applying machine learning to trust and safety problems.<n>We survey literature on automated detection of misinformation across a corpus of 248 well-cited papers in the field.<n>We conclude that the current state-of-the-art in fully-automated detection has limited efficacy in detecting human-generated misinformation.
arXiv Detail & Related papers (2023-08-23T15:52:20Z)
Deepfake Detection: A Comprehensive Survey from the Reliability Perspective [20.873480187150804]
The mushroomed Deepfake synthetic materials circulated on the internet have raised a profound social impact on politicians, celebrities, and individuals worldwide. We identify three reliability-oriented research challenges in the current Deepfake detection domain: transferability, interpretability, and robustness.
arXiv Detail & Related papers (2022-11-20T06:31:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.