Related papers: The Broken Windows Theory Applies to Technical Debt

Related papers

Architecture-Aware Multi-Design Generation for Repository-Level Feature Addition [53.50448142467294]
RAIM is a multi-design and architecture-aware framework for repository-level feature addition.<n>It shifts away from linear patching by generating multiple diverse implementation designs.<n>Experiments on the NoCode-bench Verified dataset demonstrate that RAIM establishes a new state-of-the-art performance.
arXiv Detail & Related papers (2026-03-02T12:50:40Z)
Code-in-the-Loop Forensics: Agentic Tool Use for Image Forgery Detection [59.04089915447622]
ForenAgent is an interactive IFD framework that enables MLLMs to autonomously generate, execute, and refine Python-based low-level tools around the detection objective.<n>Inspired by human reasoning, we design a dynamic reasoning loop comprising global perception, local focusing, iterative probing, and holistic adjudication.<n>Experiments show that ForenAgent exhibits emergent tool-use competence and reflective reasoning on challenging IFD tasks.
arXiv Detail & Related papers (2025-12-18T08:38:44Z)
Multi-Agent Systems for Dataset Adaptation in Software Engineering: Capabilities, Limitations, and Future Directions [8.97512410819274]
This paper presents the first empirical study on how state-of-the-art multi-agent systems perform in dataset adaptation tasks.<n>We evaluate GitHub Copilot on adapting SE research artifacts from benchmark repositories including ROCODE and LogHub2.0.<n>Results show that current systems can identify key files and generate partial adaptations but rarely produce correct implementations.
arXiv Detail & Related papers (2025-11-26T13:26:11Z)
VAR: Visual Attention Reasoning via Structured Search and Backtracking [49.427842994857635]
We introduce Visual Attention Reasoning, a framework that recasts grounded reasoning as a structured search.<n> VAR decomposes the reasoning process into two key stages: traceable evidence grounding and search-based chain-of-thought.<n>We show that our 7B model, VAR-7B, sets a new state-of-the-art on a comprehensive suite of hallucination and safety benchmarks.
arXiv Detail & Related papers (2025-10-21T13:18:44Z)
Demystifying deep search: a holistic evaluation with hint-free multi-hop questions and factorised metrics [89.1999907891494]
We present WebDetective, a benchmark of hint-free multi-hop questions paired with a controlled Wikipedia sandbox.<n>Our evaluation of 25 state-of-the-art models reveals systematic weaknesses across all architectures.<n>We develop an agentic workflow, EvidenceLoop, that explicitly targets the challenges our benchmark identifies.
arXiv Detail & Related papers (2025-10-01T07:59:03Z)
T^2Agent A Tool-augmented Multimodal Misinformation Detection Agent with Monte Carlo Tree Search [51.91311158085973]
multimodal misinformation often arises from mixed forgery sources, requiring dynamic reasoning and adaptive verification.<n>We propose T2Agent, a novel misinformation detection agent that incorporates a toolkit with Monte Carlo Tree Search.<n>Extensive experiments show that T2Agent consistently outperforms existing baselines on challenging mixed-source multimodal misinformation benchmarks.
arXiv Detail & Related papers (2025-05-26T09:50:55Z)
A Systematic Mapping Study on Contract-based Software Design for Dependable Systems [0.45880283710344055]
Contract-based Design (CbD) is a valuable methodology for software design that allows annotation of code and architectural components with contracts.<n>It establishes rules that outline the behaviour of software components and their interfaces and interactions.<n>Despite the significance and well-established theoretical background of CbD, there is a need for a comprehensive systematic mapping study for reliable software systems.
arXiv Detail & Related papers (2025-05-12T13:25:29Z)
Racing Against the Clock: Exploring the Impact of Scheduled Deadlines on Technical Debt [3.391083554509444]
This study investigates the impact of scheduled deadlines on Technical Debt (TD)<n>It analyzes how scheduled deadlines affect code quality, commit activities, and issues in issue-tracking systems.
arXiv Detail & Related papers (2025-05-07T00:05:01Z)
Thinking Longer, Not Larger: Enhancing Software Engineering Agents via Scaling Test-Time Compute [61.00662702026523]
We propose a unified Test-Time Compute scaling framework that leverages increased inference-time instead of larger models. Our framework incorporates two complementary strategies: internal TTC and external TTC. We demonstrate our textbf32B model achieves a 46% issue resolution rate, surpassing significantly larger models such as DeepSeek R1 671B and OpenAI o1.
arXiv Detail & Related papers (2025-03-31T07:31:32Z)
OpenTAD: A Unified Framework and Comprehensive Study of Temporal Action Detection [86.30994231610651]
Temporal action detection (TAD) is a fundamental video understanding task that aims to identify human actions and localize their temporal boundaries in videos. We propose textbfOpenTAD, a unified TAD framework consolidating 16 different TAD methods and 9 standard datasets into a modular framework. Minimal effort is required to replace one module with a different design, train a feature-based TAD model in end-to-end mode, or switch between the two.
arXiv Detail & Related papers (2025-02-27T18:32:27Z)
Improving the detection of technical debt in Java source code with an enriched dataset [12.07607688189035]
Technical debt (TD) is the additional work and costs that emerge when developers opt for a quick and easy solution to a problem. Recent research has focused on detecting Self-Admitted Technical Debts (SATDs) by analyzing comments embedded in source code. We curated the first ever dataset of TD identified by code comments, coupled with its associated source code.
arXiv Detail & Related papers (2024-11-08T10:12:33Z)
Binary Code Similarity Detection via Graph Contrastive Learning on Intermediate Representations [52.34030226129628]
Binary Code Similarity Detection (BCSD) plays a crucial role in numerous fields, including vulnerability detection, malware analysis, and code reuse identification. In this paper, we propose IRBinDiff, which mitigates compilation differences by leveraging LLVM-IR with higher-level semantic abstraction. Our extensive experiments, conducted under varied compilation settings, demonstrate that IRBinDiff outperforms other leading BCSD methods in both One-to-one comparison and One-to-many search scenarios.
arXiv Detail & Related papers (2024-10-24T09:09:20Z)
Hierarchical Reinforcement Learning for Temporal Abstraction of Listwise Recommendation [51.06031200728449]
We propose a novel framework called mccHRL to provide different levels of temporal abstraction on listwise recommendation. Within the hierarchical framework, the high-level agent studies the evolution of user perception, while the low-level agent produces the item selection policy. Results observe significant performance improvement by our method, compared with several well-known baselines.
arXiv Detail & Related papers (2024-09-11T17:01:06Z)
Identifying Technical Debt and Its Types Across Diverse Software Projects Issues [4.6173290119212265]
Technical Debt (TD) identification in software projects issues is crucial for maintaining code quality, reducing long-term maintenance costs, and improving overall project health. This study advances TD classification using transformer-based models, addressing the critical need for accurate and efficient TD identification in large-scale software development.
arXiv Detail & Related papers (2024-08-17T07:46:54Z)
A Comprehensive Library for Benchmarking Multi-class Visual Anomaly Detection [52.228708947607636]
This paper introduces a comprehensive visual anomaly detection benchmark, ADer, which is a modular framework for new methods. The benchmark includes multiple datasets from industrial and medical domains, implementing fifteen state-of-the-art methods and nine comprehensive metrics. We objectively reveal the strengths and weaknesses of different methods and provide insights into the challenges and future directions of multi-class visual anomaly detection.
arXiv Detail & Related papers (2024-06-05T13:40:07Z)
BayesFLo: Bayesian fault localization of complex software systems [3.8607945277671054]
Key step in software testing is fault localization, which uses test data to pinpoint failure-inducing combinations. Existing fault localization methods have two key limitations. They do not incorporate domain and/or structural knowledge from test engineers.
arXiv Detail & Related papers (2024-03-12T21:12:53Z)
Rank Flow Embedding for Unsupervised and Semi-Supervised Manifold Learning [9.171175292808144]
We propose a novel manifold learning algorithm named Rank Flow Embedding (RFE) for unsupervised and semi-supervised scenarios. RFE computes context-sensitive embeddings, which are refined following a rank-based processing flow. The generated embeddings can be exploited for more effective unsupervised retrieval or semi-supervised classification.
arXiv Detail & Related papers (2023-04-24T21:02:12Z)
Deep Metric Learning for Unsupervised Remote Sensing Change Detection [60.89777029184023]
Remote Sensing Change Detection (RS-CD) aims to detect relevant changes from Multi-Temporal Remote Sensing Images (MT-RSIs) The performance of existing RS-CD methods is attributed to training on large annotated datasets. This paper proposes an unsupervised CD method based on deep metric learning that can deal with both of these issues.
arXiv Detail & Related papers (2023-03-16T17:52:45Z)
Benchmarking Deep Models for Salient Object Detection [67.07247772280212]
We construct a general SALient Object Detection (SALOD) benchmark to conduct a comprehensive comparison among several representative SOD methods. In the above experiments, we find that existing loss functions usually specialized in some metrics but reported inferior results on the others. We propose a novel Edge-Aware (EA) loss that promotes deep networks to learn more discriminative features by integrating both pixel- and image-level supervision signals.
arXiv Detail & Related papers (2022-02-07T03:43:16Z)
NADS: Neural Architecture Distribution Search for Uncertainty Awareness [79.18710225716791]
Machine learning (ML) systems often encounter Out-of-Distribution (OoD) errors when dealing with testing data coming from a distribution different from training data. Existing OoD detection approaches are prone to errors and even sometimes assign higher likelihoods to OoD samples. We propose Neural Architecture Distribution Search (NADS) to identify common building blocks among all uncertainty-aware architectures.
arXiv Detail & Related papers (2020-06-11T17:39:07Z)
Applications of Unsupervised Deep Transfer Learning to Intelligent Fault Diagnosis: A Survey and Comparative Study [1.2345552555178128]
We construct a new taxonomy and perform a comprehensive review of UDTL-based IFD according to different tasks. To emphasize the importance and importance of UDTL-based IFD, the whole test framework will be released to the research community.
arXiv Detail & Related papers (2019-12-28T21:45:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.