Related papers: A Taxonomy of Prompt Defects in LLM Systems

A Taxonomy of Prompt Defects in LLM Systems

URL: http://arxiv.org/abs/2509.14404v1
Date: Wed, 17 Sep 2025 20:11:22 GMT
Title: A Taxonomy of Prompt Defects in LLM Systems
Authors: Haoye Tian, Chong Wang, BoYang Yang, Lyuye Zhang, Yang Liu,
Abstract summary: Large Language Models (LLMs) have become key components of modern software.<n>Small mistakes can cascade into unreliable, insecure, or inefficient behavior.<n>This paper presents the first systematic survey and taxonomy of prompt defects.
Score: 13.177777446130712
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) have become key components of modern software, with prompts acting as their de-facto programming interface. However, prompt design remains largely empirical and small mistakes can cascade into unreliable, insecure, or inefficient behavior. This paper presents the first systematic survey and taxonomy of prompt defects, recurring ways that prompts fail to elicit their intended behavior from LLMs. We organize defects along six dimensions: (1) Specification and Intent, (2) Input and Content, (3) Structure and Formatting, (4) Context and Memory, (5) Performance and Efficiency, and (6) Maintainability and Engineering. Each dimension is refined into fine-grained subtypes, illustrated with concrete examples and root cause analysis. Grounded in software engineering principles, we show how these defects surface in real development workflows and examine their downstream effects. For every subtype, we distill mitigation strategies that span emerging prompt engineering patterns, automated guardrails, testing harnesses, and evaluation frameworks. We then summarize these strategies in a master taxonomy that links defect, impact, and remedy. We conclude with open research challenges and a call for rigorous engineering-oriented methodologies to ensure that LLM-driven systems are dependable by design.

Related papers

GenAI for Systems: Recurring Challenges and Design Principles from Software to Silicon [62.2138479061386]
Generative AI is reshaping how computing systems are designed, optimized, and built, yet research remains fragmented across software, architecture, and chip design communities.<n>This paper takes a cross-stack perspective, examining how generative models are being applied from code generation and distributed runtimes through hardware design space exploration to RTL synthesis, physical layout, and verification.
arXiv Detail & Related papers (2026-02-16T22:45:33Z)
Reporting LLM Prompting in Automated Software Engineering: A Guideline Based on Current Practices and Expectations [39.62249759297524]
Large Language Models are increasingly used to automate Software Engineering tasks.<n>These models are guided through natural language prompts, making prompt engineering a critical factor in system performance and behavior.<n>Despite their growing role in SE research, prompt-related decisions are rarely documented in a systematic or transparent manner.
arXiv Detail & Related papers (2026-01-05T10:01:20Z)
Are We SOLID Yet? An Empirical Study on Prompting LLMs to Detect Design Principle Violations [0.17464576727343348]
We present a benchmark of four leading LLMs-CodeLlama, DeepSeekCoder, QwenCoder, and GPT-4o Mini.<n>We test four distinct prompt strategies inspired by established zero-shot, few-shot, and chain-of-thought techniques.<n>We show that prompt strategy has a dramatic impact, but no single strategy is universally best.
arXiv Detail & Related papers (2025-09-03T07:48:38Z)
Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning [53.85659415230589]
This paper systematically reviews widely adoptedReinforcement learning techniques.<n>We present clear guidelines for selecting RL techniques tailored to specific setups.<n>We also reveal that a minimalist combination of two techniques can unlock the learning capability of critic-free policies.
arXiv Detail & Related papers (2025-08-11T17:39:45Z)
Evaluating Large Language Models for Real-World Engineering Tasks [75.97299249823972]
This paper introduces a curated database comprising over 100 questions derived from authentic, production-oriented engineering scenarios.<n>Using this dataset, we evaluate four state-of-the-art Large Language Models (LLMs)<n>Our results show that LLMs demonstrate strengths in basic temporal and structural reasoning but struggle significantly with abstract reasoning, formal modeling, and context-sensitive engineering logic.
arXiv Detail & Related papers (2025-05-12T14:05:23Z)
A Survey of Frontiers in LLM Reasoning: Inference Scaling, Learning to Reason, and Agentic Systems [93.8285345915925]
Reasoning is a fundamental cognitive process that enables logical inference, problem-solving, and decision-making.<n>With the rapid advancement of large language models (LLMs), reasoning has emerged as a key capability that distinguishes advanced AI systems.<n>We categorize existing methods along two dimensions: (1) Regimes, which define the stage at which reasoning is achieved; and (2) Architectures, which determine the components involved in the reasoning process.
arXiv Detail & Related papers (2025-04-12T01:27:49Z)
LLMs in Software Security: A Survey of Vulnerability Detection Techniques and Insights [12.424610893030353]
Large Language Models (LLMs) are emerging as transformative tools for software vulnerability detection.<n>This paper provides a detailed survey of LLMs in vulnerability detection.<n>We address challenges such as cross-language vulnerability detection, multimodal data integration, and repository-level analysis.
arXiv Detail & Related papers (2025-02-10T21:33:38Z)
MR-Ben: A Meta-Reasoning Benchmark for Evaluating System-2 Thinking in LLMs [55.20845457594977]
Large language models (LLMs) have shown increasing capability in problem-solving and decision-making.<n>We present a process-based benchmark MR-Ben that demands a meta-reasoning skill.<n>Our meta-reasoning paradigm is especially suited for system-2 slow thinking.
arXiv Detail & Related papers (2024-06-20T03:50:23Z)
An Empirical Study of Automated Vulnerability Localization with Large Language Models [21.84971967029474]
Large Language Models (LLMs) have shown potential in various domains, yet their effectiveness in vulnerability localization remains underexplored. Our investigation encompasses 10+ leading LLMs suitable for code analysis, including ChatGPT and various open-source models. We explore the efficacy of these LLMs using 4 distinct paradigms: zero-shot learning, one-shot learning, discriminative fine-tuning, and generative fine-tuning.
arXiv Detail & Related papers (2024-03-30T08:42:10Z)
Automatically Correcting Large Language Models: Surveying the landscape of diverse self-correction strategies [104.32199881187607]
Large language models (LLMs) have demonstrated remarkable performance across a wide array of NLP tasks. A promising approach to rectify these flaws is self-correction, where the LLM itself is prompted or guided to fix problems in its own output. This paper presents a comprehensive review of this emerging class of techniques.
arXiv Detail & Related papers (2023-08-06T18:38:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.