Related papers: PromptDebt: A Comprehensive Study of Technical Debt Across LLM Projects

PromptDebt: A Comprehensive Study of Technical Debt Across LLM Projects

URL: http://arxiv.org/abs/2509.20497v1
Date: Wed, 24 Sep 2025 19:20:09 GMT
Title: PromptDebt: A Comprehensive Study of Technical Debt Across LLM Projects
Authors: Ahmed Aljohani, Hyunsook Do,
Abstract summary: Large Language Models (LLMs) are increasingly embedded in software via OpenAI, offering powerful AI features without heavy infrastructure.<n>Yet these integrations bring their own form of self-admitted technical debt (SATD)<n>In this paper, we present the first large-scale empirical study of SATD: its origins, prevalence, and mitigation strategies.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) are increasingly embedded in software via APIs like OpenAI, offering powerful AI features without heavy infrastructure. Yet these integrations bring their own form of self-admitted technical debt (SATD). In this paper, we present the first large-scale empirical study of LLM-specific SATD: its origins, prevalence, and mitigation strategies. By analyzing 93,142 Python files across major LLM APIs, we found that 54.49% of SATD instances stem from OpenAI integrations and 12.35% from LangChain use. Prompt design emerged as the primary source of LLM-specific SATD, with 6.61% of debt related to prompt configuration and optimization issues, followed by hyperparameter tuning and LLM-framework integration. We further explored which prompt techniques attract the most debt, revealing that instruction-based prompts (38.60%) and few-shot prompts (18.13%) are particularly vulnerable due to their dependence on instruction clarity and example quality. Finally, we release a comprehensive SATD dataset to support reproducibility and offer practical guidance for managing technical debt in LLM-powered systems.

Related papers

Self-Admitted Technical Debt in LLM Software: An Empirical Comparison with ML and Non-ML Software [0.8156494881838944]
Self-admitted technical debt (SATD) refers to comments flagged by developers that explicitly acknowledge suboptimal code or incomplete functionality.<n>We conduct the first empirical study of SATD in the Large Language Model era.
arXiv Detail & Related papers (2026-01-09T19:25:48Z)
Iterative Self-Incentivization Empowers Large Language Models as Agentic Searchers [74.17516978246152]
Large language models (LLMs) have been widely integrated into information retrieval to advance traditional techniques.<n>We propose EXSEARCH, an agentic search framework, where the LLM learns to retrieve useful information as the reasoning unfolds.<n>Experiments on four knowledge-intensive benchmarks show that EXSEARCH substantially outperforms baselines.
arXiv Detail & Related papers (2025-05-26T15:27:55Z)
The Promise and Limits of LLMs in Constructing Proofs and Hints for Logic Problems in Intelligent Tutoring Systems [4.146233417549798]
Large language models (LLMs) offer promising capabilities for dynamic feedback generation.<n>LLMs risk producing hallucinations or pedagogically unsound explanations.<n>DeepSeek-V3 achieved superior performance with 84.4% accuracy on stepwise proof construction.
arXiv Detail & Related papers (2025-05-07T18:48:23Z)
MMLU-ProX: A Multilingual Benchmark for Advanced Large Language Model Evaluation [86.7047714187813]
MMLU-ProX is a benchmark covering 29 languages, built on an English benchmark.<n>Each language version consists of 11,829 identical questions, enabling direct cross-linguistic comparisons.<n>To meet efficient evaluation needs, we provide a lite version containing 658 questions per language.
arXiv Detail & Related papers (2025-03-13T15:59:20Z)
MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale [66.73529246309033]
multimodal large language models (MLLMs) have shown significant potential in a broad range of multimodal tasks.<n>Existing instruction-tuning datasets only provide phrase-level answers without any intermediate rationales.<n>We introduce a scalable and cost-effective method to construct a large-scale multimodal instruction-tuning dataset with rich intermediate rationales.
arXiv Detail & Related papers (2024-12-06T18:14:24Z)
Unsupervised Text Representation Learning via Instruction-Tuning for Zero-Shot Dense Retrieval [19.422003299376]
We introduce a novel unsupervised text representation learning technique via instruction-tuning. We demonstrate the corpus representation can be augmented by the representations of relevant synthetic queries. We significantly improve the average zero-shot retrieval performance on all metrics.
arXiv Detail & Related papers (2024-09-24T23:03:13Z)
A Multi-Agent Approach to Fault Localization via Graph-Based Retrieval and Reflexion [8.22737389683156]
Traditional fault localization techniques require extensive training datasets and high computational resources.<n>Recent advances in Large Language Models (LLMs) offer new opportunities by enhancing code understanding and reasoning.<n>We propose LLM4FL, a multi-agent fault localization framework that utilizes three specialized LLM agents.<n> evaluated on the Defects4J benchmark, which includes 675 faults from 14 Java projects, LLM4FL achieves an 18.55% improvement in Top-1 accuracy over AutoFL and 4.82% over SoapFL.
arXiv Detail & Related papers (2024-09-20T16:47:34Z)
zkLLM: Zero Knowledge Proofs for Large Language Models [6.993329554241878]
ZkLLM is a specialized zero-knowledge proof tailored for large language models (LLMs) It is designed to uphold the privacy of the model parameters, ensuring no inadvertent information leakage.
arXiv Detail & Related papers (2024-04-24T18:04:50Z)
An Empirical Study of Self-Admitted Technical Debt in Machine Learning Software [17.999512016809945]
Self-admitted technical debt (SATD) can have a significant impact on the quality of machine learning-based software. This paper aims to investigate SATD in ML code by analyzing 318 open-source ML projects across five domains, along with 318 non-ML projects.
arXiv Detail & Related papers (2023-11-20T18:56:36Z)
Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models [122.19845578690466]
Step-Back Prompting enables LLMs to do abstractions to derive high-level concepts and first principles from instances containing specific details. Using the concepts and principles to guide reasoning, LLMs significantly improve their abilities in following a correct reasoning path towards the solution.
arXiv Detail & Related papers (2023-10-09T19:48:55Z)
How Effective are Large Language Models in Generating Software Specifications? [14.170320751508502]
Large Language Models (LLMs) have been successfully applied to numerous Software Engineering (SE) tasks.<n>We conduct the first empirical study to evaluate the capabilities of LLMs for generating software specifications from software comments or documentation.
arXiv Detail & Related papers (2023-06-06T00:28:39Z)
SatLM: Satisfiability-Aided Language Models Using Declarative Prompting [68.40726892904286]
We propose a new satisfiability-aided language modeling (SatLM) approach for improving the reasoning capabilities of large language models (LLMs) We use an LLM to generate a declarative task specification rather than an imperative program and leverage an off-the-shelf automated theorem prover to derive the final answer. We evaluate SATLM on 8 different datasets and show that it consistently outperforms program-aided LMs in the imperative paradigm.
arXiv Detail & Related papers (2023-05-16T17:55:51Z)
Augmented Large Language Models with Parametric Knowledge Guiding [72.71468058502228]
Large Language Models (LLMs) have significantly advanced natural language processing (NLP) with their impressive language understanding and generation capabilities. Their performance may be suboptimal for domain-specific tasks that require specialized knowledge due to limited exposure to the related data. We propose the novel Parametric Knowledge Guiding (PKG) framework, which equips LLMs with a knowledge-guiding module to access relevant knowledge.
arXiv Detail & Related papers (2023-05-08T15:05:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.