Related papers: Which Prompting Technique Should I Use? An Empirical Investigation of Prompting Techniques for Software Engineering Tasks

Which Prompting Technique Should I Use? An Empirical Investigation of Prompting Techniques for Software Engineering Tasks

URL: http://arxiv.org/abs/2506.05614v1
Date: Thu, 05 Jun 2025 21:58:44 GMT
Title: Which Prompting Technique Should I Use? An Empirical Investigation of Prompting Techniques for Software Engineering Tasks
Authors: E. G. Santana Jr, Gabriel Benjamin, Melissa Araujo, Harrison Santos, David Freitas, Eduardo Almeida, Paulo Anselmo da M. S. Neto, Jiawei Li, Jina Chun, Iftekhar Ahmed,
Abstract summary: We present a systematic evaluation of 14 established prompt techniques across 10 software engineering (SE) tasks using four Large Language Models (LLMs)<n>As identified in the prior literature, the selected prompting techniques span six core dimensions (Zero-Shot, Few-Shot, Thought Generation, Ensembling, Self-Criticism, and Decomposition)<n>Our results show which prompting techniques are most effective for SE tasks requiring complex logic and intensive reasoning versus those that rely more on contextual understanding and example-driven scenarios.
Score: 6.508214641182163
License: http://creativecommons.org/licenses/by/4.0/
Abstract: A growing variety of prompt engineering techniques has been proposed for Large Language Models (LLMs), yet systematic evaluation of each technique on individual software engineering (SE) tasks remains underexplored. In this study, we present a systematic evaluation of 14 established prompt techniques across 10 SE tasks using four LLM models. As identified in the prior literature, the selected prompting techniques span six core dimensions (Zero-Shot, Few-Shot, Thought Generation, Ensembling, Self-Criticism, and Decomposition). They are evaluated on tasks such as code generation, bug fixing, and code-oriented question answering, to name a few. Our results show which prompting techniques are most effective for SE tasks requiring complex logic and intensive reasoning versus those that rely more on contextual understanding and example-driven scenarios. We also analyze correlations between the linguistic characteristics of prompts and the factors that contribute to the effectiveness of prompting techniques in enhancing performance on SE tasks. Additionally, we report the time and token consumption for each prompting technique when applied to a specific task and model, offering guidance for practitioners in selecting the optimal prompting technique for their use cases.

Related papers

Grammar-Guided Evolutionary Search for Discrete Prompt Optimisation [63.97051732013936]
We propose an evolutionary search approach to automated discrete prompt optimisation consisting of two phases.<n>In the first phase, grammar-guided genetic programming is invoked to synthesise prompt-creating programmes.<n>In the second phase, local search is applied to explore the neighbourhoods of best-performing programmes.
arXiv Detail & Related papers (2025-07-14T14:34:15Z)
Evolving Prompts In-Context: An Open-ended, Self-replicating Perspective [65.12150411762273]
We show that pruning random demonstrations into seemingly incoherent "gibberish" can remarkably improve performance across diverse tasks.<n>We propose a self-discover prompt optimization framework, PromptQuine, that automatically searches for the pruning strategy by itself using only low-data regimes.
arXiv Detail & Related papers (2025-06-22T07:53:07Z)
The Future of MLLM Prompting is Adaptive: A Comprehensive Experimental Evaluation of Prompt Engineering Methods for Robust Multimodal Performance [0.393259574660092]
Multimodal Large Language Models (MLLMs) are set to transform how machines process and generate human-like responses.<n>We present a comprehensive experimental evaluation of seven prompt engineering methods applied to 13 open-source MLLMs over 24 tasks.
arXiv Detail & Related papers (2025-04-14T12:31:39Z)
The Impact of Prompt Programming on Function-Level Code Generation [3.072802875195726]
Large Language Models (LLMs) are increasingly used by software engineers for code generation.<n>This study introduces CodePromptEval, a dataset of 7072 prompts designed to evaluate five prompt techniques.<n>Our findings show that while certain prompt techniques significantly influence the generated code, combining multiple techniques does not necessarily improve the outcome.
arXiv Detail & Related papers (2024-12-29T18:34:10Z)
Do Advanced Language Models Eliminate the Need for Prompt Engineering in Software Engineering? [18.726229967976316]
This paper reevaluates various prompt engineering techniques within the context of advanced Large Language Models (LLMs) Our findings reveal that prompt engineering techniques developed for earlier LLMs may provide diminished benefits or even hinder performance when applied to advanced models.
arXiv Detail & Related papers (2024-11-04T13:56:37Z)
QPO: Query-dependent Prompt Optimization via Multi-Loop Offline Reinforcement Learning [58.767866109043055]
We introduce Query-dependent Prompt Optimization (QPO), which iteratively fine-tune a small pretrained language model to generate optimal prompts tailored to the input queries.<n>We derive insights from offline prompting demonstration data, which already exists in large quantities as a by-product of benchmarking diverse prompts on open-sourced tasks.<n> Experiments on various LLM scales and diverse NLP and math tasks demonstrate the efficacy and cost-efficiency of our method in both zero-shot and few-shot scenarios.
arXiv Detail & Related papers (2024-08-20T03:06:48Z)
The Prompt Report: A Systematic Survey of Prompt Engineering Techniques [42.618971816813385]
Generative Artificial Intelligence systems are increasingly being deployed across diverse industries and research domains.<n> prompt engineering suffers from conflicting terminology and a fragmented ontological understanding.<n>We establish a structured understanding of prompt engineering by assembling a taxonomy of prompting techniques and analyzing their applications.
arXiv Detail & Related papers (2024-06-06T18:10:11Z)
Efficient Prompting Methods for Large Language Models: A Survey [50.82812214830023]
Efficient Prompting Methods have attracted a wide range of attention.<n>We discuss Automatic Prompt Engineering for different prompt components and Prompt Compression in continuous and discrete spaces.
arXiv Detail & Related papers (2024-04-01T12:19:08Z)
Towards Generalist Prompting for Large Language Models by Mental Models [105.03747314550591]
Large language models (LLMs) have demonstrated impressive performance on many tasks. To achieve optimal performance, specially designed prompting methods are still needed. We introduce the concept of generalist prompting, which operates on the design principle of achieving optimal or near-optimal performance.
arXiv Detail & Related papers (2024-02-28T11:29:09Z)
An Empirical Categorization of Prompting Techniques for Large Language Models: A Practitioner's Guide [0.34530027457862006]
In this survey, we examine some of the most well-known prompting techniques from both academic and practical viewpoints. We present an overview of each category, aiming to clarify their unique contributions and showcase their practical applications.
arXiv Detail & Related papers (2024-02-18T23:03:56Z)
TEMPERA: Test-Time Prompting via Reinforcement Learning [57.48657629588436]
We propose Test-time Prompt Editing using Reinforcement learning (TEMPERA) In contrast to prior prompt generation methods, TEMPERA can efficiently leverage prior knowledge. Our method achieves 5.33x on average improvement in sample efficiency when compared to the traditional fine-tuning methods.
arXiv Detail & Related papers (2022-11-21T22:38:20Z)
A Review of Uncertainty Quantification in Deep Learning: Techniques, Applications and Challenges [76.20963684020145]
Uncertainty quantification (UQ) plays a pivotal role in reduction of uncertainties during both optimization and decision making processes. Bizarre approximation and ensemble learning techniques are two most widely-used UQ methods in the literature. This study reviews recent advances in UQ methods used in deep learning and investigates the application of these methods in reinforcement learning.
arXiv Detail & Related papers (2020-11-12T06:41:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.