PromptDebt: A Comprehensive Study of Technical Debt Across LLM Projects
- URL: http://arxiv.org/abs/2509.20497v1
- Date: Wed, 24 Sep 2025 19:20:09 GMT
- Title: PromptDebt: A Comprehensive Study of Technical Debt Across LLM Projects
- Authors: Ahmed Aljohani, Hyunsook Do,
- Abstract summary: Large Language Models (LLMs) are increasingly embedded in software via OpenAI, offering powerful AI features without heavy infrastructure.<n>Yet these integrations bring their own form of self-admitted technical debt (SATD)<n>In this paper, we present the first large-scale empirical study of SATD: its origins, prevalence, and mitigation strategies.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models (LLMs) are increasingly embedded in software via APIs like OpenAI, offering powerful AI features without heavy infrastructure. Yet these integrations bring their own form of self-admitted technical debt (SATD). In this paper, we present the first large-scale empirical study of LLM-specific SATD: its origins, prevalence, and mitigation strategies. By analyzing 93,142 Python files across major LLM APIs, we found that 54.49% of SATD instances stem from OpenAI integrations and 12.35% from LangChain use. Prompt design emerged as the primary source of LLM-specific SATD, with 6.61% of debt related to prompt configuration and optimization issues, followed by hyperparameter tuning and LLM-framework integration. We further explored which prompt techniques attract the most debt, revealing that instruction-based prompts (38.60%) and few-shot prompts (18.13%) are particularly vulnerable due to their dependence on instruction clarity and example quality. Finally, we release a comprehensive SATD dataset to support reproducibility and offer practical guidance for managing technical debt in LLM-powered systems.
Related papers
- Self-Admitted Technical Debt in LLM Software: An Empirical Comparison with ML and Non-ML Software [0.8156494881838944]
Self-admitted technical debt (SATD) refers to comments flagged by developers that explicitly acknowledge suboptimal code or incomplete functionality.<n>We conduct the first empirical study of SATD in the Large Language Model era.
arXiv Detail & Related papers (2026-01-09T19:25:48Z) - Iterative Self-Incentivization Empowers Large Language Models as Agentic Searchers [74.17516978246152]
Large language models (LLMs) have been widely integrated into information retrieval to advance traditional techniques.<n>We propose EXSEARCH, an agentic search framework, where the LLM learns to retrieve useful information as the reasoning unfolds.<n>Experiments on four knowledge-intensive benchmarks show that EXSEARCH substantially outperforms baselines.
arXiv Detail & Related papers (2025-05-26T15:27:55Z) - The Promise and Limits of LLMs in Constructing Proofs and Hints for Logic Problems in Intelligent Tutoring Systems [4.146233417549798]
Large language models (LLMs) offer promising capabilities for dynamic feedback generation.<n>LLMs risk producing hallucinations or pedagogically unsound explanations.<n>DeepSeek-V3 achieved superior performance with 84.4% accuracy on stepwise proof construction.
arXiv Detail & Related papers (2025-05-07T18:48:23Z) - MMLU-ProX: A Multilingual Benchmark for Advanced Large Language Model Evaluation [86.7047714187813]
MMLU-ProX is a benchmark covering 29 languages, built on an English benchmark.<n>Each language version consists of 11,829 identical questions, enabling direct cross-linguistic comparisons.<n>To meet efficient evaluation needs, we provide a lite version containing 658 questions per language.
arXiv Detail & Related papers (2025-03-13T15:59:20Z) - MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale [66.73529246309033]
multimodal large language models (MLLMs) have shown significant potential in a broad range of multimodal tasks.<n>Existing instruction-tuning datasets only provide phrase-level answers without any intermediate rationales.<n>We introduce a scalable and cost-effective method to construct a large-scale multimodal instruction-tuning dataset with rich intermediate rationales.
arXiv Detail & Related papers (2024-12-06T18:14:24Z) - Unsupervised Text Representation Learning via Instruction-Tuning for Zero-Shot Dense Retrieval [19.422003299376]
We introduce a novel unsupervised text representation learning technique via instruction-tuning.
We demonstrate the corpus representation can be augmented by the representations of relevant synthetic queries.
We significantly improve the average zero-shot retrieval performance on all metrics.
arXiv Detail & Related papers (2024-09-24T23:03:13Z) - A Multi-Agent Approach to Fault Localization via Graph-Based Retrieval and Reflexion [8.22737389683156]
Traditional fault localization techniques require extensive training datasets and high computational resources.<n>Recent advances in Large Language Models (LLMs) offer new opportunities by enhancing code understanding and reasoning.<n>We propose LLM4FL, a multi-agent fault localization framework that utilizes three specialized LLM agents.<n> evaluated on the Defects4J benchmark, which includes 675 faults from 14 Java projects, LLM4FL achieves an 18.55% improvement in Top-1 accuracy over AutoFL and 4.82% over SoapFL.
arXiv Detail & Related papers (2024-09-20T16:47:34Z) - zkLLM: Zero Knowledge Proofs for Large Language Models [6.993329554241878]
ZkLLM is a specialized zero-knowledge proof tailored for large language models (LLMs)
It is designed to uphold the privacy of the model parameters, ensuring no inadvertent information leakage.
arXiv Detail & Related papers (2024-04-24T18:04:50Z) - An Empirical Study of Self-Admitted Technical Debt in Machine Learning Software [17.999512016809945]
Self-admitted technical debt (SATD) can have a significant impact on the quality of machine learning-based software.
This paper aims to investigate SATD in ML code by analyzing 318 open-source ML projects across five domains, along with 318 non-ML projects.
arXiv Detail & Related papers (2023-11-20T18:56:36Z) - Take a Step Back: Evoking Reasoning via Abstraction in Large Language
Models [122.19845578690466]
Step-Back Prompting enables LLMs to do abstractions to derive high-level concepts and first principles from instances containing specific details.
Using the concepts and principles to guide reasoning, LLMs significantly improve their abilities in following a correct reasoning path towards the solution.
arXiv Detail & Related papers (2023-10-09T19:48:55Z) - How Effective are Large Language Models in Generating Software Specifications? [14.170320751508502]
Large Language Models (LLMs) have been successfully applied to numerous Software Engineering (SE) tasks.<n>We conduct the first empirical study to evaluate the capabilities of LLMs for generating software specifications from software comments or documentation.
arXiv Detail & Related papers (2023-06-06T00:28:39Z) - SatLM: Satisfiability-Aided Language Models Using Declarative Prompting [68.40726892904286]
We propose a new satisfiability-aided language modeling (SatLM) approach for improving the reasoning capabilities of large language models (LLMs)
We use an LLM to generate a declarative task specification rather than an imperative program and leverage an off-the-shelf automated theorem prover to derive the final answer.
We evaluate SATLM on 8 different datasets and show that it consistently outperforms program-aided LMs in the imperative paradigm.
arXiv Detail & Related papers (2023-05-16T17:55:51Z) - Augmented Large Language Models with Parametric Knowledge Guiding [72.71468058502228]
Large Language Models (LLMs) have significantly advanced natural language processing (NLP) with their impressive language understanding and generation capabilities.
Their performance may be suboptimal for domain-specific tasks that require specialized knowledge due to limited exposure to the related data.
We propose the novel Parametric Knowledge Guiding (PKG) framework, which equips LLMs with a knowledge-guiding module to access relevant knowledge.
arXiv Detail & Related papers (2023-05-08T15:05:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.