Related papers: Prompting in the Wild: An Empirical Study of Prompt Evolution in Software Repositories

Prompting in the Wild: An Empirical Study of Prompt Evolution in Software Repositories

URL: http://arxiv.org/abs/2412.17298v2
Date: Sun, 12 Jan 2025 20:13:35 GMT
Title: Prompting in the Wild: An Empirical Study of Prompt Evolution in Software Repositories
Authors: Mahan Tafreshipour, Aaron Imani, Eric Huang, Eduardo Almeida, Thomas Zimmermann, Iftekhar Ahmed,
Abstract summary: This study presents the first empirical analysis of prompt evolution in LLM-integrated software development.<n>We analyzed 1,262 prompt changes across 243 GitHub repositories to investigate the patterns and frequencies of prompt changes.<n>Our findings show that developers primarily evolve prompts through additions and modifications, with most changes occurring during feature development.
Score: 11.06441376653589
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The adoption of Large Language Models (LLMs) is reshaping software development as developers integrate these LLMs into their applications. In such applications, prompts serve as the primary means of interacting with LLMs. Despite the widespread use of LLM-integrated applications, there is limited understanding of how developers manage and evolve prompts. This study presents the first empirical analysis of prompt evolution in LLM-integrated software development. We analyzed 1,262 prompt changes across 243 GitHub repositories to investigate the patterns and frequencies of prompt changes, their relationship with code changes, documentation practices, and their impact on system behavior. Our findings show that developers primarily evolve prompts through additions and modifications, with most changes occurring during feature development. We identified key challenges in prompt engineering: only 21.9% of prompt changes are documented in commit messages, changes can introduce logical inconsistencies, and misalignment often occurs between prompt changes and LLM responses. These insights emphasize the need for specialized testing frameworks, automated validation tools, and improved documentation practices to enhance the reliability of LLM-integrated applications.

Related papers

Prompting LLMs for Code Editing: Struggles and Remedies [39.02507244469977]
Large Language Models (LLMs) are rapidly transforming software engineering, with coding assistants embedded in an IDE becoming increasingly prevalent. This paper addresses part of this gap through a multi-phased investigation of developer interactions with an LLM-powered code editing and transformation feature, Transform Code, in an IDE widely used at Google. We analyze telemetry logs of the feature usage, revealing that frequent re-prompting can be an indicator of developer struggles with using Transform Code. We propose and evaluate a tool, AutoPrompter, for automatically improving prompts by inferring missing information from the surrounding code context, leading to a 27% improvement in
arXiv Detail & Related papers (2025-04-28T18:59:28Z)
LLMs' Reshaping of People, Processes, Products, and Society in Software Development: A Comprehensive Exploration with Early Adopters [3.4069804433026314]
Large language models (LLMs) like OpenAI ChatGPT, Google Gemini, and GitHub Copilot are rapidly gaining traction in the software industry. Our study provides a nuanced understanding of how LLMs are shaping the landscape of software development.
arXiv Detail & Related papers (2025-03-06T22:27:05Z)
Promptware Engineering: Software Engineering for LLM Prompt Development [22.788377588087894]
Large Language Models (LLMs) are increasingly integrated into software applications, with prompts serving as the primary 'programming' interface. As a result, a new software paradigm, promptware, has emerged, using natural language prompts to interact with LLMs. Unlike traditional software, which relies on formal programming languages and deterministic runtime environments, promptware is based on ambiguous, unstructured, and context-dependent natural language.
arXiv Detail & Related papers (2025-03-04T08:43:16Z)
LLMs as Continuous Learners: Improving the Reproduction of Defective Code in Software Issues [62.12404317786005]
EvoCoder is a continuous learning framework for issue code reproduction. Our results show a 20% improvement in issue reproduction rates over existing SOTA methods.
arXiv Detail & Related papers (2024-11-21T08:49:23Z)
RETAIN: Interactive Tool for Regression Testing Guided LLM Migration [8.378294455013284]
RETAIN (REgression Testing guided LLM migrAtIoN) is a tool designed explicitly for regression testing in LLM Migrations. Our automatic evaluation and empirical user studies demonstrate that RETAIN, when compared to manual evaluation, enabled participants to identify twice as many errors, facilitated experimentation with 75% more prompts, and achieves 12% higher metric scores in a given time frame.
arXiv Detail & Related papers (2024-09-05T22:22:57Z)
An Empirical Study on Challenges for LLM Application Developers [28.69628251749012]
We crawl and analyze 29,057 relevant questions from a popular OpenAI developer forum. After manually analyzing 2,364 sampled questions, we construct a taxonomy of challenges faced by LLM developers.
arXiv Detail & Related papers (2024-08-06T05:46:28Z)
What Did I Do Wrong? Quantifying LLMs' Sensitivity and Consistency to Prompt Engineering [8.019873464066308]
We introduce two metrics for classification tasks, namely sensitivity and consistency. sensitivity measures changes of predictions across rephrasings of the prompt. Instead, consistency measures how predictions vary across rephrasings for elements of the same class.
arXiv Detail & Related papers (2024-06-18T06:59:24Z)
CodeEditorBench: Evaluating Code Editing Capability of Large Language Models [49.387195629660994]
Large Language Models (LLMs) for code are rapidly evolving, with code editing emerging as a critical capability. We introduce CodeEditorBench, an evaluation framework designed to rigorously assess the performance of LLMs in code editing tasks. We curate diverse coding challenges and scenarios from five sources, covering various programming languages, complexity levels, and editing tasks.
arXiv Detail & Related papers (2024-04-04T15:49:49Z)
Prompting Large Language Models to Tackle the Full Software Development Lifecycle: A Case Study [72.24266814625685]
We explore the performance of large language models (LLMs) across the entire software development lifecycle with DevEval.<n>DevEval features four programming languages, multiple domains, high-quality data collection, and carefully designed and verified metrics for each task.<n> Empirical studies show that current LLMs, including GPT-4, fail to solve the challenges presented within DevEval.
arXiv Detail & Related papers (2024-03-13T15:13:44Z)
If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents [81.60906807941188]
Large language models (LLMs) are trained on a combination of natural language and formal language (code) Code translates high-level goals into executable steps, featuring standard syntax, logical consistency, abstraction, and modularity.
arXiv Detail & Related papers (2024-01-01T16:51:20Z)
ML-Bench: Evaluating Large Language Models and Agents for Machine Learning Tasks on Repository-Level Code [76.84199699772903]
ML-Bench is a benchmark rooted in real-world programming applications that leverage existing code repositories to perform tasks. To evaluate both Large Language Models (LLMs) and AI agents, two setups are employed: ML-LLM-Bench for assessing LLMs' text-to-code conversion within a predefined deployment environment, and ML-Agent-Bench for testing autonomous agents in an end-to-end task execution within a Linux sandbox environment.
arXiv Detail & Related papers (2023-11-16T12:03:21Z)
Testing LLMs on Code Generation with Varying Levels of Prompt Specificity [0.0]
Large language models (LLMs) have demonstrated unparalleled prowess in mimicking human-like text generation and processing. The potential to transform natural language prompts into executable code promises a major shift in software development practices.
arXiv Detail & Related papers (2023-11-10T23:41:41Z)
Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback [127.75419038610455]
Large language models (LLMs) are able to generate human-like, fluent responses for many downstream tasks. This paper proposes a LLM-Augmenter system, which augments a black-box LLM with a set of plug-and-play modules.
arXiv Detail & Related papers (2023-02-24T18:48:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.