Related papers: Guidelines to Prompt Large Language Models for Code Generation: An Empirical Characterization

Guidelines to Prompt Large Language Models for Code Generation: An Empirical Characterization

URL: http://arxiv.org/abs/2601.13118v1
Date: Mon, 19 Jan 2026 15:01:42 GMT
Title: Guidelines to Prompt Large Language Models for Code Generation: An Empirical Characterization
Authors: Alessandro Midolo, Alessandro Giagnorio, Fiorella Zampetti, Rosalia Tufano, Gabriele Bavota, Massimiliano Di Penta,
Abstract summary: We derive and evaluate development-specific prompt optimization guidelines.<n>We use an iterative, test-driven approach to automatically refine code generation prompts.<n>We conduct an assessment with 50 practitioners, who report their usage of the elicited prompt improvement patterns.
Score: 82.29178197694819
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) are nowadays extensively used for various types of software engineering tasks, primarily code generation. Previous research has shown how suitable prompt engineering could help developers in improving their code generation prompts. However, so far, there do not exist specific guidelines driving developers towards writing suitable prompts for code generation. In this work, we derive and evaluate development-specific prompt optimization guidelines. First, we use an iterative, test-driven approach to automatically refine code generation prompts, and we analyze the outcome of this process to identify prompt improvement items that lead to test passes. We use such elements to elicit 10 guidelines for prompt improvement, related to better specifying I/O, pre-post conditions, providing examples, various types of details, or clarifying ambiguities. We conduct an assessment with 50 practitioners, who report their usage of the elicited prompt improvement patterns, as well as their perceived usefulness, which does not always correspond to the actual usage before knowing our guidelines. Our results lead to implications not only for practitioners and educators, but also for those aimed at creating better LLM-aided software development tools.

Related papers

IFEvalCode: Controlled Code Generation [69.28317223249358]
The paper introduces forward and backward constraints generation to improve the instruction-following capabilities of Code LLMs.<n>The authors present IFEvalCode, a multilingual benchmark comprising 1.6K test samples across seven programming languages.
arXiv Detail & Related papers (2025-07-30T08:08:48Z)
Prompting LLMs for Code Editing: Struggles and Remedies [39.02507244469977]
Large Language Models (LLMs) are rapidly transforming software engineering, with coding assistants embedded in an IDE becoming increasingly prevalent.<n>This paper addresses part of this gap through a multi-phased investigation of developer interactions with an LLM-powered code editing and transformation feature, Transform Code, in an IDE widely used at Google.<n>We analyze telemetry logs of the feature usage, revealing that frequent re-prompting can be an indicator of developer struggles with using Transform Code.<n>We propose and evaluate a tool, AutoPrompter, for automatically improving prompts by inferring missing information from the surrounding code context, leading to a 27% improvement in
arXiv Detail & Related papers (2025-04-28T18:59:28Z)
CodeIF: Benchmarking the Instruction-Following Capabilities of Large Language Models for Code Generation [20.013757490442064]
We introduce CodeIF, the first benchmark designed to assess the abilities of Large Language Models (LLMs) to adhere to task-oriented instructions.<n>CodeIF encompasses a broad range of tasks, including function synthesis, algorithmic instructions, and code explanation.<n>We conduct extensive experiments with LLMs, analyzing their strengths and limitations in meeting the demands of these tasks.
arXiv Detail & Related papers (2025-02-26T14:19:49Z)
Pragmatic Reasoning improves LLM Code Generation [34.01862911727194]
Large Language Models (LLMs) have demonstrated impressive potential in translating natural language (NL) instructions into program code.<n>To address this challenge, researchers have proposed approaches that produce multiple candidates of the program code and then rerank them to identify the best solution.<n>We propose CodeRSA, a novel code candidate reranking mechanism built upon the Rational Speech Act (RSA) framework.
arXiv Detail & Related papers (2025-02-20T12:44:26Z)
Language Models for Code Optimization: Survey, Challenges and Future Directions [7.928856221466083]
Language models (LMs) built upon deep neural networks (DNNs) have recently demonstrated breakthrough effectiveness in software engineering tasks.<n>This study aims to provide actionable insights and references for both researchers and practitioners in this rapidly evolving field.
arXiv Detail & Related papers (2025-01-02T14:20:36Z)
Codev-Bench: How Do LLMs Understand Developer-Centric Code Completion? [60.84912551069379]
We present the Code-Development Benchmark (Codev-Bench), a fine-grained, real-world, repository-level, and developer-centric evaluation framework. Codev-Agent is an agent-based system that automates repository crawling, constructs execution environments, extracts dynamic calling chains from existing unit tests, and generates new test samples to avoid data leakage.
arXiv Detail & Related papers (2024-10-02T09:11:10Z)
A Survey on Evaluating Large Language Models in Code Generation Tasks [30.256255254277914]
This paper provides a comprehensive review of the current methods and metrics used to evaluate the performance of Large Language Models (LLMs) in code generation tasks.<n>With the rapid growth in demand for automated software development, LLMs have demonstrated significant potential in the field of code generation.
arXiv Detail & Related papers (2024-08-29T12:56:06Z)
Exploring Large Language Model for Graph Data Understanding in Online Job Recommendations [63.19448893196642]
We present a novel framework that harnesses the rich contextual information and semantic representations provided by large language models to analyze behavior graphs. By leveraging this capability, our framework enables personalized and accurate job recommendations for individual users.
arXiv Detail & Related papers (2023-07-10T11:29:41Z)
Recommendation as Instruction Following: A Large Language Model Empowered Recommendation Approach [83.62750225073341]
We consider recommendation as instruction following by large language models (LLMs) We first design a general instruction format for describing the preference, intention, task form and context of a user in natural language. Then we manually design 39 instruction templates and automatically generate a large amount of user-personalized instruction data.
arXiv Detail & Related papers (2023-05-11T17:39:07Z)
TEMPERA: Test-Time Prompting via Reinforcement Learning [57.48657629588436]
We propose Test-time Prompt Editing using Reinforcement learning (TEMPERA) In contrast to prior prompt generation methods, TEMPERA can efficiently leverage prior knowledge. Our method achieves 5.33x on average improvement in sample efficiency when compared to the traditional fine-tuning methods.
arXiv Detail & Related papers (2022-11-21T22:38:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.