From Code Smells to Best Practices: Tackling Resource Leaks in PyTorch, TensorFlow, and Keras
- URL: http://arxiv.org/abs/2511.15229v2
- Date: Wed, 26 Nov 2025 06:13:31 GMT
- Title: From Code Smells to Best Practices: Tackling Resource Leaks in PyTorch, TensorFlow, and Keras
- Authors: Bashar Abdallah, Martyna E. Wojciechowska, Gustavo Santos, Edmand Yu, Maxime Lamothe, Alain Abran, Mohammad Hamdaqa,
- Abstract summary: We identify 30 PyTorch-related smells and 16 snippets/Keras smells linked to resource leaks.<n>For each smell, we derived at least one best practice, resulting in 50 recommended coding patterns.<n>This is the first comprehensive study to examine resource-leak-inducing code smells across major ML frameworks.
- Score: 2.0939163364197317
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Much of the existing ML research focuses on model performance metrics, leaving limited attention to the long-term sustainability and resource efficiency of ML applications. While high performance is essential, ensuring efficient resource management is equally critical for robust deployment. This study addresses this gap by systematically identifying code smells that lead to resource leaks in ML applications. We conducted an empirical investigation of developer discussions and real-world code snippets from PyTorch, TensorFlow, and Keras. The analysis identified 30 PyTorch-related smells and 16 TensorFlow/Keras smells linked to resource leaks. These smells were categorized in two ways: (1) based on their root causes, and (2) as general ML smells with framework-specific characteristics. For each smell, we derived at least one best practice, resulting in 50 recommended coding patterns aimed at reducing resource leakage and improving efficiency. To ensure the validity of our findings, we employed a three-phase validation process involving independent analysis by three authors followed by consensus discussions. This is the first comprehensive study to examine resource-leak-inducing code smells across major ML frameworks and to present actionable best practices for mitigating them. The contributions support developers in building more efficient and sustainable ML applications and offer a structured view of the underlying causes of resource leaks.
Related papers
- DiffuRank: Effective Document Reranking with Diffusion Language Models [71.16830004674513]
We propose DiffuRank, a reranking framework built upon diffusion language models (dLLMs)<n>dLLMs support more flexible decoding and generation processes that are not constrained to a left-to-right order.<n>We show dLLMs achieve performance comparable to, and in some cases exceeding, that of autoregressive LLMs with similar model sizes.
arXiv Detail & Related papers (2026-02-13T02:18:14Z) - Beyond Strict Rules: Assessing the Effectiveness of Large Language Models for Code Smell Detection [0.5249836059995157]
Code smells are symptoms of potential code quality problems that may affect software maintainability.<n>This paper evaluates the effectiveness of four large language models (LLMs) for detecting nine code smells across 30 Java projects.
arXiv Detail & Related papers (2026-01-14T21:08:35Z) - Clean Code, Better Models: Enhancing LLM Performance with Smell-Cleaned Dataset [13.23492570818459]
This study takes the first systematic research to assess and improve dataset quality in terms of code smells.<n>We propose an LLM-based code smell cleaning tool, named SmellCC, which automatically removes code smells.
arXiv Detail & Related papers (2025-08-16T07:40:58Z) - Teaching LLM to Reason: Reinforcement Learning from Algorithmic Problems without Code [76.80306464249217]
We propose TeaR, which aims at teaching LLMs to reason better.<n>TeaR leverages careful data curation and reinforcement learning to guide models in discovering optimal reasoning paths through code-related tasks.<n>We conduct extensive experiments using two base models and three long-CoT distillation models, with model sizes ranging from 1.5 billion to 32 billion parameters, and across 17 benchmarks spanning Math, Knowledge, Code, and Logical Reasoning.
arXiv Detail & Related papers (2025-07-10T07:34:05Z) - Learning Efficient and Generalizable Graph Retriever for Knowledge-Graph Question Answering [75.12322966980003]
Large Language Models (LLMs) have shown strong inductive reasoning ability across various domains.<n>Most existing RAG pipelines rely on unstructured text, limiting interpretability and structured reasoning.<n>Recent studies have explored integrating knowledge graphs with LLMs for knowledge graph question answering.<n>We propose RAPL, a novel framework for efficient and effective graph retrieval in KGQA.
arXiv Detail & Related papers (2025-06-11T12:03:52Z) - Performance Smells in ML and Non-ML Python Projects: A Comparative Study [10.064805853389277]
This study provides a comparative analysis of performance smells between Machine Learning and non-ML projects.<n>Our results indicate that ML projects are more susceptible to performance smells due to the computational and data-intensive nature of ML.<n>Our study underscores the need to tailor performance optimization strategies to the unique characteristics of ML projects.
arXiv Detail & Related papers (2025-04-28T19:48:26Z) - The Dual-use Dilemma in LLMs: Do Empowering Ethical Capacities Make a Degraded Utility? [54.18519360412294]
Large Language Models (LLMs) must balance between rejecting harmful requests for safety and accommodating legitimate ones for utility.<n>This paper presents a Direct Preference Optimization (DPO) based alignment framework that achieves better overall performance.<n>We analyze experimental results obtained from testing DeepSeek-R1 on our benchmark and reveal the critical ethical concerns raised by this highly acclaimed model.
arXiv Detail & Related papers (2025-01-20T06:35:01Z) - How Propense Are Large Language Models at Producing Code Smells? A Benchmarking Study [45.126233498200534]
We introduce CodeSmellEval, a benchmark designed to evaluate the propensity of Large Language Models for generating code smells.<n>Our benchmark includes a novel metric: Propensity Smelly Score (PSC), and a curated dataset of method-level code smells: CodeSmellData.<n>To demonstrate the use of CodeSmellEval, we conducted a case study with two state-of-the-art LLMs, CodeLlama and Mistral.
arXiv Detail & Related papers (2024-12-25T21:56:35Z) - OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models [76.59316249991657]
Large language models (LLMs) for code have become indispensable in various domains, including code generation, reasoning tasks and agent systems.<n>While open-access code LLMs are increasingly approaching the performance levels of proprietary models, high-quality code LLMs remain limited.<n>We introduce OpenCoder, a top-tier code LLM that not only achieves performance comparable to leading models but also serves as an "open cookbook" for the research community.
arXiv Detail & Related papers (2024-11-07T17:47:25Z) - LLMDFA: Analyzing Dataflow in Code with Large Language Models [8.92611389987991]
This paper presents LLMDFA, a compilation-free and customizable dataflow analysis framework.
We decompose the problem into several subtasks and introduce a series of novel strategies.
On average, LLMDFA achieves 87.10% precision and 80.77% recall, surpassing existing techniques with F1 score improvements of up to 0.35.
arXiv Detail & Related papers (2024-02-16T15:21:35Z) - Boosting Static Resource Leak Detection via LLM-based Resource-Oriented Intention Inference [14.783216988363804]
Existing static detection techniques rely on mechanical matching of predefined resource acquisition/release APIs and null-checking conditions to find unreleased resources.<n>We propose InferROI, a novel approach that directly infers resource-oriented intentions (acquisition, release, and reachability validation) in code.<n>We evaluate the effectiveness of InferROI in both resource-oriented intention inference and resource leak detection.
arXiv Detail & Related papers (2023-11-08T04:19:28Z) - Towards Realistic Low-resource Relation Extraction: A Benchmark with
Empirical Baseline Study [51.33182775762785]
This paper presents an empirical study to build relation extraction systems in low-resource settings.
We investigate three schemes to evaluate the performance in low-resource settings: (i) different types of prompt-based methods with few-shot labeled data; (ii) diverse balancing methods to address the long-tailed distribution issue; and (iii) data augmentation technologies and self-training to generate more labeled in-domain data.
arXiv Detail & Related papers (2022-10-19T15:46:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.