How Far Have LLMs Come Toward Automated SATD Taxonomy Construction?
- URL: http://arxiv.org/abs/2506.09601v4
- Date: Thu, 23 Oct 2025 03:07:36 GMT
- Title: How Far Have LLMs Come Toward Automated SATD Taxonomy Construction?
- Authors: Sota Nakashima, Yuta Ishimoto, Masanari Kondo, Tao Xiao, Yasutaka Kamei,
- Abstract summary: When developers intentionally introduce technical debt, it is called self-admitted technical debt (SATD)<n>Traditionally, constructing such requires manually inspecting SATD comments and surrounding code.<n>In this study, we investigated what extent large language models (LLMs) could generate SATD.<n>We designed a structured, LLM-driven pipeline that mirrors the taxonomy construction steps researchers typically follow.
- Score: 4.372207458643732
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Technical debt refers to suboptimal code that degrades software quality. When developers intentionally introduce such debt, it is called self-admitted technical debt (SATD). Since SATD hinders maintenance, identifying its categories is key to uncovering quality issues. Traditionally, constructing such taxonomies requires manually inspecting SATD comments and surrounding code, which is time-consuming, labor-intensive, and often inconsistent due to annotator subjectivity. In this study, we investigated to what extent large language models (LLMs) could generate SATD taxonomies. We designed a structured, LLM-driven pipeline that mirrors the taxonomy construction steps researchers typically follow. We evaluated it on SATD datasets from three domains: quantum software, smart contracts, and machine learning. It successfully recovered domain-specific categories reported in prior work, such as Layer Configuration in machine learning. It also completed taxonomy generation in under two hours and for less than $1, even on the largest dataset. These results suggest that, while full automation remains challenging, LLMs can support semi-automated SATD taxonomy construction. Furthermore, our work opens up avenues for future work, such as automated taxonomy generation in other areas.
Related papers
- SWE-Compass: Towards Unified Evaluation of Agentic Coding Abilities for Large Language Models [59.90381306452982]
evaluating large language models (LLMs) for software engineering has been limited by narrow task coverage, language bias, and insufficient alignment with real-world developer.<n>We introduce SWE-1, a comprehensive benchmark that unifies heterogeneous code-related evaluations into a structured and production-aligned framework.<n>SWE- spans 8 task types, 8 programming scenarios, and 10 programming languages, with 2000 high-quality instances curated from authentic GitHub pull requests.
arXiv Detail & Related papers (2025-11-07T18:01:32Z) - SwingArena: Competitive Programming Arena for Long-context GitHub Issue Solving [90.32201622392137]
We present SwingArena, a competitive evaluation framework for Large Language Models (LLMs)<n>Unlike traditional static benchmarks, SwingArena models the collaborative process of software by pairing LLMs as iterations, who generate patches, and reviewers, who create test cases and verify the patches through continuous integration (CI) pipelines.
arXiv Detail & Related papers (2025-05-29T18:28:02Z) - SweRank: Software Issue Localization with Code Ranking [109.3289316191729]
SweRank is an efficient retrieve-and-rerank framework for software issue localization.<n>We construct SweLoc, a large-scale dataset curated from public GitHub repositories.<n>We show that SweRank achieves state-of-the-art performance, outperforming both prior ranking models and costly agent-based systems.
arXiv Detail & Related papers (2025-05-07T19:44:09Z) - Exploring a Large Language Model for Transforming Taxonomic Data into OWL: Lessons Learned and Implications for Ontology Development [63.74965026095835]
This paper investigates the use of ChatGPT-4 to automate the development of the :Organism module in the Agricultural Product Types Ontology (APTO) for species classification.<n>Our methodology involved leveraging ChatGPT-4 to extract data from the GBIF Backbone API and generate files for further integration in APTO.<n>Two alternative approaches were explored: (1) issuing a series of prompts for ChatGPT-4 to execute tasks via the BrowserOP plugin and (2) directing ChatGPT-4 to design a Python algorithm to perform tasks.
arXiv Detail & Related papers (2025-04-25T19:05:52Z) - Automatic Bottom-Up Taxonomy Construction: A Software Application Domain Study [6.0158981171030685]
Previous research in software application domain classification has faced challenges due to the lack of a proper taxonomy.
This study aims to develop a comprehensive software application domain taxonomy by integrating multiple datasources and leveraging ensemble methods.
arXiv Detail & Related papers (2024-09-24T08:55:07Z) - A Taxonomy of Self-Admitted Technical Debt in Deep Learning Systems [13.90991624629898]
This paper empirically analyzes the presence of Self-Admitted Technical Debt (SATD) in Deep Learning systems.
We derived a taxonomy of DL-specific SATD through open coding, featuring seven categories and 41 leaves.
arXiv Detail & Related papers (2024-09-18T09:21:10Z) - Refining Wikidata Taxonomy using Large Language Models [2.392329079182226]
We present WiKC, a new version of Wikidata taxonomy cleaned automatically using a combination of Large Language Models (LLMs) and graph mining techniques.
Operations on the taxonomy, such as cutting links or merging classes, are performed with the help of zero-shot prompting on an open-source LLM.
arXiv Detail & Related papers (2024-09-06T06:53:45Z) - AutoBencher: Towards Declarative Benchmark Construction [74.54640925146289]
We use AutoBencher to create datasets for math, multilinguality, knowledge, and safety.<n>The scalability of AutoBencher allows it to test fine-grained categories knowledge, creating datasets that elicit 22% more model errors (i.e., difficulty) than existing benchmarks.
arXiv Detail & Related papers (2024-07-11T10:03:47Z) - Using Zero-shot Prompting in the Automatic Creation and Expansion of
Topic Taxonomies for Tagging Retail Banking Transactions [0.0]
This work presents an unsupervised method for constructing and expanding topic using instruction-based fine-tuned LLMs (Large Language Models)
To expand an existing taxonomy with new terms, we use zero-shot prompting to find out where to add new nodes.
We use the resulting tags to assign tags that characterize merchants from a retail bank dataset.
arXiv Detail & Related papers (2024-01-08T00:27:16Z) - MuSR: Testing the Limits of Chain-of-thought with Multistep Soft Reasoning [63.80739044622555]
We introduce MuSR, a dataset for evaluating language models on soft reasoning tasks specified in a natural language narrative.
This dataset has two crucial features. First, it is created through a novel neurosymbolic synthetic-to-natural generation algorithm.
Second, our dataset instances are free text narratives corresponding to real-world domains of reasoning.
arXiv Detail & Related papers (2023-10-24T17:59:20Z) - DAnTE: a taxonomy for the automation degree of software engineering
tasks [2.356908851188234]
We propose DAnTE, a Degree of Automation taxonomy for software engineering.
We evaluate several tools used in the past and in the present for software engineering practices.
We discuss what novel tools could emerge in the middle and long term.
arXiv Detail & Related papers (2023-09-26T13:04:58Z) - Towards Automatically Addressing Self-Admitted Technical Debt: How Far
Are We? [17.128428286986573]
This paper empirically investigates the extent to which technical debt can be automatically paid back by neural-based generative models.
We start by extracting a dateset of 5,039 Self-Admitted Technical Debt (SATD) removals from 595 open-source projects.
We use this dataset to experiment with seven different generative deep learning (DL) model configurations.
arXiv Detail & Related papers (2023-08-17T12:27:32Z) - RaTE: a Reproducible automatic Taxonomy Evaluation by Filling the Gap [0.0]
We argue that automatic taxonomy evaluation (ATE) is just as important as taxonomy construction.
We propose RaTE, an automatic label-free taxonomy scoring procedure, which relies on a large pre-trained language model.
arXiv Detail & Related papers (2023-07-19T01:37:31Z) - BigIssue: A Realistic Bug Localization Benchmark [89.8240118116093]
BigIssue is a benchmark for realistic bug localization.
We provide a general benchmark with a diversity of real and synthetic Java bugs.
We hope to advance the state of the art in bug localization, in turn improving APR performance and increasing its applicability to the modern development cycle.
arXiv Detail & Related papers (2022-07-21T20:17:53Z) - Identifying Self-Admitted Technical Debt in Issue Tracking Systems using
Machine Learning [3.446864074238136]
Technical debt is a metaphor for sub-optimal solutions implemented for short-term benefits.
Most work on identifying Self-Admitted Technical Debt focuses on source code comments.
We propose and optimize an approach for automatically identifying SATD in issue tracking systems using machine learning.
arXiv Detail & Related papers (2022-02-04T15:15:13Z) - Who Should Go First? A Self-Supervised Concept Sorting Model for
Improving Taxonomy Expansion [50.794640012673064]
As data and business scope grow in real applications, existing need to be expanded to incorporate new concepts.
Previous works on taxonomy expansion process the new concepts independently and simultaneously, ignoring the potential relationships among them and the appropriate order of inserting operations.
We propose TaxoOrder, a novel self-supervised framework that simultaneously discovers the local hypernym-hyponym structure among new concepts and decides the order of insertion.
arXiv Detail & Related papers (2021-04-08T11:00:43Z) - Induction and Exploitation of Subgoal Automata for Reinforcement
Learning [75.55324974788475]
We present ISA, an approach for learning and exploiting subgoals in episodic reinforcement learning (RL) tasks.
ISA interleaves reinforcement learning with the induction of a subgoal automaton, an automaton whose edges are labeled by the task's subgoals.
A subgoal automaton also consists of two special states: a state indicating the successful completion of the task, and a state indicating that the task has finished without succeeding.
arXiv Detail & Related papers (2020-09-08T16:42:55Z) - Octet: Online Catalog Taxonomy Enrichment with Self-Supervision [67.26804972901952]
We present a self-supervised end-to-end framework, Octet for Online Catalog EnrichmenT.
We propose to train a sequence labeling model for term extraction and employ graph neural networks (GNNs) to capture the taxonomy structure.
Octet enriches an online catalog in production to 2 times larger in the open-world evaluation.
arXiv Detail & Related papers (2020-06-18T04:53:07Z) - TaxoExpan: Self-supervised Taxonomy Expansion with Position-Enhanced
Graph Neural Network [62.12557274257303]
Taxonomies consist of machine-interpretable semantics and provide valuable knowledge for many web applications.
We propose a novel self-supervised framework, named TaxoExpan, which automatically generates a set of query concept, anchor concept> pairs from the existing taxonomy as training data.
We develop two innovative techniques in TaxoExpan: (1) a position-enhanced graph neural network that encodes the local structure of an anchor concept in the existing taxonomy, and (2) a noise-robust training objective that enables the learned model to be insensitive to the label noise in the self-supervision data.
arXiv Detail & Related papers (2020-01-26T21:30:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.