Unravelling Technical debt topics through Time, Programming Languages and Repository
- URL: http://arxiv.org/abs/2504.11714v1
- Date: Wed, 16 Apr 2025 02:20:56 GMT
- Title: Unravelling Technical debt topics through Time, Programming Languages and Repository
- Authors: Karthik Shivashankar, Antonio Martini,
- Abstract summary: This study explores the dynamic landscape of Technical Debt (TD) topics in software engineering by examining its evolution across time, programming languages, and repositories.<n>We have conducted an explorative analysis of TD data extracted from GitHub issues spanning from 2015 to September 2023.<n>This study categorises the TD topics and tracks their progression over time. Furthermore, we have incorporated sentiment analysis for each identified topic, providing a deeper insight into the perceptions and attitudes associated with these topics.
- Score: 5.669063174637433
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This study explores the dynamic landscape of Technical Debt (TD) topics in software engineering by examining its evolution across time, programming languages, and repositories. Despite the extensive research on identifying and quantifying TD, there remains a significant gap in understanding the diversity of TD topics and their temporal development. To address this, we have conducted an explorative analysis of TD data extracted from GitHub issues spanning from 2015 to September 2023. We employed BERTopic for sophisticated topic modelling. This study categorises the TD topics and tracks their progression over time. Furthermore, we have incorporated sentiment analysis for each identified topic, providing a deeper insight into the perceptions and attitudes associated with these topics. This offers a more nuanced understanding of the trends and shifts in TD topics through time, programming language, and repository.
Related papers
- DTECT: Dynamic Topic Explorer & Context Tracker [0.8962460460173959]
We introduce DTECT (Dynamic Topic Explorer & Context Tracker), an end-to-end system that bridges the gap between raw textual data and meaningful temporal insights.<n>DTECT provides a unified workflow that supports data preprocessing, multiple model architectures, and dedicated evaluation metrics to analyze the topic quality of temporal topic models.<n>It significantly enhances interpretability by introducing LLM-driven automatic topic labeling, trend analysis via temporally salient words, interactive visualizations with document-level summarization, and a natural language chat interface for intuitive data querying.
arXiv Detail & Related papers (2025-07-10T16:44:33Z) - The Dual-Edged Sword of Technical Debt: Benefits and Issues Analyzed Through Developer Discussions [8.304493605883744]
Technical debt (TD) has long been one of the key factors influencing the maintainability of software products.
This work is to collectively investigate the practitioners' opinions on the various perspectives of TD from a large collection of articles.
arXiv Detail & Related papers (2024-07-30T17:54:36Z) - A Survey of Neural Code Intelligence: Paradigms, Advances and Beyond [84.95530356322621]
This survey presents a systematic review of the advancements in code intelligence.<n>It covers over 50 representative models and their variants, more than 20 categories of tasks, and an extensive coverage of over 680 related works.<n>Building on our examination of the developmental trajectories, we further investigate the emerging synergies between code intelligence and broader machine intelligence.
arXiv Detail & Related papers (2024-03-21T08:54:56Z) - From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models [98.41645229835493]
Data visualization in the form of charts plays a pivotal role in data analysis, offering critical insights and aiding in informed decision-making.
Large foundation models, such as large language models, have revolutionized various natural language processing tasks.
This survey paper serves as a comprehensive resource for researchers and practitioners in the fields of natural language processing, computer vision, and data analysis.
arXiv Detail & Related papers (2024-03-18T17:57:09Z) - Decoding Multilingual Topic Dynamics and Trend Identification through ARIMA Time Series Analysis on Social Networks: A Novel Data Translation Framework Enhanced by LDA/HDP Models [0.08246494848934444]
We focus on dialogues within Tunisian social networks during the Coronavirus Pandemic and other notable themes like sports and politics.
We start by aggregating a varied multilingual corpus of comments relevant to these subjects.
We then introduce our No-English-to-English Machine Translation approach to handle linguistic differences.
arXiv Detail & Related papers (2024-03-18T00:01:10Z) - Perspectives on the State and Future of Deep Learning -- 2023 [237.1458929375047]
The goal of this series is to chronicle opinions and issues in the field of machine learning as they stand today and as they change over time.
The plan is to host this survey periodically until the AI singularity paperclip-frenzy-driven doomsday, keeping an updated list of topical questions and interviewing new community members for each edition.
arXiv Detail & Related papers (2023-12-07T19:58:37Z) - Thread of Thought Unraveling Chaotic Contexts [133.24935874034782]
"Thread of Thought" (ThoT) strategy draws inspiration from human cognitive processes.
In experiments, ThoT significantly improves reasoning performance compared to other prompting techniques.
arXiv Detail & Related papers (2023-11-15T06:54:44Z) - Recent Advances in Direct Speech-to-text Translation [58.692782919570845]
We categorize the existing research work into three directions based on the main challenges -- modeling burden, data scarcity, and application issues.
For the challenge of data scarcity, recent work resorts to many sophisticated techniques, such as data augmentation, pre-training, knowledge distillation, and multilingual modeling.
We analyze and summarize the application issues, which include real-time, segmentation, named entity, gender bias, and code-switching.
arXiv Detail & Related papers (2023-06-20T16:14:27Z) - Twitter Topic Classification [15.306383757213956]
We present a new task based on tweet topic classification and release two associated datasets.
Given a wide range of topics covering the most important discussion points in social media, we provide training and testing data.
We perform a quantitative evaluation and analysis of current general- and domain-specific language models on the task.
arXiv Detail & Related papers (2022-09-20T16:13:52Z) - Time Series Analysis via Network Science: Concepts and Algorithms [62.997667081978825]
This review provides a comprehensive overview of existing mapping methods for transforming time series into networks.
We describe the main conceptual approaches, provide authoritative references and give insight into their advantages and limitations in a unified notation and language.
Although still very recent, this research area has much potential and with this survey we intend to pave the way for future research on the topic.
arXiv Detail & Related papers (2021-10-11T13:33:18Z) - Reflexivity in Issues of Scale and Representation in a Digital
Humanities Project [0.21500127800884522]
We explore issues that we have encountered in developing a pipeline that combines natural language processing with data analysis and visualization techniques.
The characteristics of the corpus - being comprised of diaries of a single person spanning several decades - present both conceptual challenges in terms of issues of representation, and affordances as a source for historical research.
We consider these issues in a team context with a particular focus on the generation and interpretation of visualizations.
arXiv Detail & Related papers (2021-09-29T04:06:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.