Prompts as Software Engineering Artifacts: A Research Agenda and Preliminary Findings
- URL: http://arxiv.org/abs/2509.17548v1
- Date: Mon, 22 Sep 2025 09:08:29 GMT
- Title: Prompts as Software Engineering Artifacts: A Research Agenda and Preliminary Findings
- Authors: Hugo Villamizar, Jannik Fischbach, Alexander Korn, Andreas Vogelsang, Daniel Mendez,
- Abstract summary: This research programme characterizes current prompt practices, challenges, and influencing factors in software engineering.<n>We conducted an exploratory survey with 74 software professionals from six countries to investigate current prompt practices and challenges.<n>The findings reveal that prompt usage in SE is largely ad-hoc: prompts are often refined through trial-and-error, rarely reused, and shaped more by individual practitioners than standardized practices.
- Score: 39.35547205954143
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Developers now routinely interact with large language models (LLMs) to support a range of software engineering (SE) tasks. This prominent role positions prompts as potential SE artifacts that, like other artifacts, may require systematic development, documentation, and maintenance. However, little is known about how prompts are actually used and managed in LLM-integrated workflows, what challenges practitioners face, and whether the benefits of systematic prompt management outweigh the associated effort. To address this gap, we propose a research programme that (a) characterizes current prompt practices, challenges, and influencing factors in SE; (b) analyzes prompts as software artifacts, examining their evolution, traceability, reuse, and the trade-offs of systematic management; and (c) develops and empirically evaluates evidence-based guidelines for managing prompts in LLM-integrated workflows. As a first step, we conducted an exploratory survey with 74 software professionals from six countries to investigate current prompt practices and challenges. The findings reveal that prompt usage in SE is largely ad-hoc: prompts are often refined through trial-and-error, rarely reused, and shaped more by individual heuristics than standardized practices. These insights not only highlight the need for more systematic approaches to prompt management but also provide the empirical foundation for the subsequent stages of our research programme.
Related papers
- On the Use of a Large Language Model to Support the Conduction of a Systematic Mapping Study: A Brief Report from a Practitioner's View [2.0199251985015434]
Large Language Models (LLMs) can handle large volumes of textual data and support methods for evidence synthesis.<n>This paper presents an experience report on the conduction of a systematic mapping study with the support of LLMs.
arXiv Detail & Related papers (2026-02-09T15:57:30Z) - Reporting LLM Prompting in Automated Software Engineering: A Guideline Based on Current Practices and Expectations [39.62249759297524]
Large Language Models are increasingly used to automate Software Engineering tasks.<n>These models are guided through natural language prompts, making prompt engineering a critical factor in system performance and behavior.<n>Despite their growing role in SE research, prompt-related decisions are rarely documented in a systematic or transparent manner.
arXiv Detail & Related papers (2026-01-05T10:01:20Z) - Understanding the Role of Large Language Models in Software Engineering: Evidence from an Industry Survey [0.6660458629649825]
This paper reports an empirical study of Large Language Models (LLMs) adoption in software engineering, based on a survey of 46 industry professionals.<n>Results reveal positive perceptions of LLMs, particularly regarding faster resolution of technical questions, improved documentation support, and enhanced source code standardization.<n> respondents also expressed concerns about cognitive dependence, security risks, and the potential erosion of technical autonomy.
arXiv Detail & Related papers (2025-12-19T20:57:19Z) - Let the Barbarians In: How AI Can Accelerate Systems Performance Research [80.43506848683633]
We term this iterative cycle of generation, evaluation, and refinement AI-Driven Research for Systems.<n>We demonstrate that ADRS-generated solutions can match or even outperform human state-of-the-art designs.
arXiv Detail & Related papers (2025-12-16T18:51:23Z) - Deep Research: A Systematic Survey [118.82795024422722]
Deep Research (DR) aims to combine the reasoning capabilities of large language models with external tools, such as search engines.<n>This survey presents a comprehensive and systematic overview of deep research systems.
arXiv Detail & Related papers (2025-11-24T15:28:28Z) - Software Testing with Large Language Models: An Interview Study with Practitioners [2.198430261120653]
The use of large language models in software testing is growing fast as they support numerous tasks.<n>However, their adoption often relies on informal experimentation rather than structured guidance.<n>This study investigates how software testing professionals use LLMs in practice to propose a preliminary, practitioner-informed guideline.
arXiv Detail & Related papers (2025-10-20T05:06:56Z) - A Comprehensive Survey on Benchmarks and Solutions in Software Engineering of LLM-Empowered Agentic System [56.40989626804489]
This survey provides the first holistic analysis of Large Language Models-powered software engineering.<n>We review over 150 recent papers and propose a taxonomy along two key dimensions: (1) Solutions, categorized into prompt-based, fine-tuning-based, and agent-based paradigms, and (2) Benchmarks, including tasks such as code generation, translation, and repair.
arXiv Detail & Related papers (2025-10-10T06:56:50Z) - Large Language Models (LLMs) for Requirements Engineering (RE): A Systematic Literature Review [2.0061679654181392]
The study categorizes the literature according to several dimensions, including publication trends, RE activities, prompting strategies, and evaluation methods.<n>Most of the studies focus on using LLMs for requirements elicitation and validation, rather than defect detection and classification.<n>Other artifacts are increasingly considered, including issues from issue tracking systems, regulations, and technical manuals.
arXiv Detail & Related papers (2025-09-14T21:45:01Z) - Agile Management for Machine Learning: A Systematic Mapping Study [1.0396117988046165]
Machine learning (ML)-enabled systems are present in our society, driving significant digital transformations.<n>The dynamic nature of ML development, characterized by experimental cycles and rapid changes in data, poses challenges to traditional project management.<n>This study aims to outline the state of the art in agile management for ML-enabled systems.
arXiv Detail & Related papers (2025-06-25T18:47:08Z) - Evaluating Large Language Models for Real-World Engineering Tasks [75.97299249823972]
This paper introduces a curated database comprising over 100 questions derived from authentic, production-oriented engineering scenarios.<n>Using this dataset, we evaluate four state-of-the-art Large Language Models (LLMs)<n>Our results show that LLMs demonstrate strengths in basic temporal and structural reasoning but struggle significantly with abstract reasoning, formal modeling, and context-sensitive engineering logic.
arXiv Detail & Related papers (2025-05-12T14:05:23Z) - Promptware Engineering: Software Engineering for LLM Prompt Development [22.788377588087894]
Large Language Models (LLMs) are increasingly integrated into software applications, with prompts serving as the primary 'programming' interface.<n>As a result, a new software paradigm, promptware, has emerged, using natural language prompts to interact with LLMs.<n>Unlike traditional software, which relies on formal programming languages and deterministic runtime environments, promptware is based on ambiguous, unstructured, and context-dependent natural language.
arXiv Detail & Related papers (2025-03-04T08:43:16Z) - How to Measure Performance in Agile Software Development? A Mixed-Method Study [2.477589198476322]
The study aims to identify challenges that arise when using agile software development performance metrics in practice.
Results show that while widely used performance metrics are widely used in practice, agile software development teams face challenges due to a lack of transparency and standardization as well as insufficient accuracy.
arXiv Detail & Related papers (2024-07-08T19:53:01Z) - Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More? [54.667202878390526]
Long-context language models (LCLMs) have the potential to revolutionize our approach to tasks traditionally reliant on external tools like retrieval systems or databases.
We introduce LOFT, a benchmark of real-world tasks requiring context up to millions of tokens designed to evaluate LCLMs' performance on in-context retrieval and reasoning.
Our findings reveal LCLMs' surprising ability to rival state-of-the-art retrieval and RAG systems, despite never having been explicitly trained for these tasks.
arXiv Detail & Related papers (2024-06-19T00:28:58Z) - Experiential Co-Learning of Software-Developing Agents [83.34027623428096]
Large language models (LLMs) have brought significant changes to various domains, especially in software development.
We introduce Experiential Co-Learning, a novel LLM-agent learning framework.
Experiments demonstrate that the framework enables agents to tackle unseen software-developing tasks more effectively.
arXiv Detail & Related papers (2023-12-28T13:50:42Z) - Just Tell Me: Prompt Engineering in Business Process Management [63.08166397142146]
GPT-3 and other language models (LMs) can effectively address various natural language processing (NLP) tasks.
We argue that prompt engineering can help bring the capabilities of LMs to BPM research.
arXiv Detail & Related papers (2023-04-14T14:55:19Z) - Software engineering for artificial intelligence and machine learning
software: A systematic literature review [6.681725960709127]
This study aims to investigate how software engineering has been applied in the development of AI/ML systems.
Main challenges faced by professionals are in areas of testing, AI software quality, and data management.
arXiv Detail & Related papers (2020-11-07T11:06:28Z) - Curriculum Learning for Reinforcement Learning Domains: A Framework and
Survey [53.73359052511171]
Reinforcement learning (RL) is a popular paradigm for addressing sequential decision tasks in which the agent has only limited environmental feedback.
We present a framework for curriculum learning (CL) in RL, and use it to survey and classify existing CL methods in terms of their assumptions, capabilities, and goals.
arXiv Detail & Related papers (2020-03-10T20:41:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.