The Prevalence of Code Smells in Machine Learning projects
- URL: http://arxiv.org/abs/2103.04146v1
- Date: Sat, 6 Mar 2021 16:01:54 GMT
- Title: The Prevalence of Code Smells in Machine Learning projects
- Authors: Bart van Oort, Lu\'is Cruz, Maur\'icio Aniche, Arie van Deursen
- Abstract summary: static code analysis can be used to find potential defects in the source code, opportunities, and violations of common coding standards.
We gathered a dataset of 74 open-source Machine Learning projects, installed their dependencies and ran Pylint on them.
This resulted in a top 20 of all detected code smells, per category.
- Score: 9.722159563454436
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Artificial Intelligence (AI) and Machine Learning (ML) are pervasive in the
current computer science landscape. Yet, there still exists a lack of software
engineering experience and best practices in this field. One such best
practice, static code analysis, can be used to find code smells, i.e.,
(potential) defects in the source code, refactoring opportunities, and
violations of common coding standards. Our research set out to discover the
most prevalent code smells in ML projects. We gathered a dataset of 74
open-source ML projects, installed their dependencies and ran Pylint on them.
This resulted in a top 20 of all detected code smells, per category. Manual
analysis of these smells mainly showed that code duplication is widespread and
that the PEP8 convention for identifier naming style may not always be
applicable to ML code due to its resemblance with mathematical notation. More
interestingly, however, we found several major obstructions to the
maintainability and reproducibility of ML projects, primarily related to the
dependency management of Python projects. We also found that Pylint cannot
reliably check for correct usage of imported dependencies, including prominent
ML libraries such as PyTorch.
Related papers
- CRUXEval-X: A Benchmark for Multilingual Code Reasoning, Understanding and Execution [50.7413285637879]
The CRUXEVAL-X code reasoning benchmark contains 19 programming languages.
It comprises at least 600 subjects for each language, along with 19K content-consistent tests in total.
Even a model trained solely on Python can achieve at most 34.4% Pass@1 in other languages.
arXiv Detail & Related papers (2024-08-23T11:43:00Z) - What's Wrong with Your Code Generated by Large Language Models? An Extensive Study [80.18342600996601]
Large language models (LLMs) produce code that is shorter yet more complicated as compared to canonical solutions.
We develop a taxonomy of bugs for incorrect codes that includes three categories and 12 sub-categories, and analyze the root cause for common bug types.
We propose a novel training-free iterative method that introduces self-critique, enabling LLMs to critique and correct their generated code based on bug types and compiler feedback.
arXiv Detail & Related papers (2024-07-08T17:27:17Z) - VersiCode: Towards Version-controllable Code Generation [58.82709231906735]
Large Language Models (LLMs) have made tremendous strides in code generation, but existing research fails to account for the dynamic nature of software development.
We propose two novel tasks aimed at bridging this gap: version-specific code completion (VSCC) and version-aware code migration (VACM)
We conduct an extensive evaluation on VersiCode, which reveals that version-controllable code generation is indeed a significant challenge.
arXiv Detail & Related papers (2024-06-11T16:15:06Z) - Detecting Refactoring Commits in Machine Learning Python Projects: A Machine Learning-Based Approach [3.000496428347787]
MLRefScanner identifies commits with both ML-specific and general operations.
Our study highlights the potential of ML-driven approaches in detecting programming across diverse languages and technical domains.
arXiv Detail & Related papers (2024-04-09T18:46:56Z) - InfiBench: Evaluating the Question-Answering Capabilities of Code Large Language Models [56.723509505549536]
InfiBench is the first large-scale freeform question-answering (QA) benchmark for code to our knowledge.
It comprises 234 carefully selected high-quality Stack Overflow questions that span across 15 programming languages.
We conduct a systematic evaluation for over 100 latest code LLMs on InfiBench, leading to a series of novel and insightful findings.
arXiv Detail & Related papers (2024-03-11T02:06:30Z) - Assured LLM-Based Software Engineering [51.003878077888686]
This paper is an outline of the content of the keynote by Mark Harman at the International Workshop on Interpretability, Robustness, and Benchmarking in Neural Software Engineering, Monday 15th April 2024, Lisbon, Portugal.
arXiv Detail & Related papers (2024-02-06T20:38:46Z) - Julearn: an easy-to-use library for leakage-free evaluation and
inspection of ML models [0.23301643766310373]
We present the rationale behind julearn's design, its core features, and showcase three examples of previously-published research projects.
Julearn aims to simplify the entry into the machine learning world by providing an easy-to-use environment with built in guards against some of the most common ML pitfalls.
arXiv Detail & Related papers (2023-10-19T08:21:12Z) - CodeTF: One-stop Transformer Library for State-of-the-art Code LLM [72.1638273937025]
We present CodeTF, an open-source Transformer-based library for state-of-the-art Code LLMs and code intelligence.
Our library supports a collection of pretrained Code LLM models and popular code benchmarks.
We hope CodeTF is able to bridge the gap between machine learning/generative AI and software engineering.
arXiv Detail & Related papers (2023-05-31T05:24:48Z) - The Good, the Bad, and the Missing: Neural Code Generation for Machine
Learning Tasks [11.837851107416588]
This paper investigates the effectiveness of existing neural code generation models on Machine Learning programming tasks.
We select six state-of-the-art neural code generation models, and evaluate their performance on four widely used ML libraries.
Our empirical study reveals some good, bad, and missing aspects of neural code generation models on ML tasks.
arXiv Detail & Related papers (2023-05-16T00:52:02Z) - Prevalence of Code Smells in Reinforcement Learning Projects [1.7218973692320518]
Reinforcement Learning (RL) is being increasingly used to learn and adapt application behavior in many domains, including large-scale and safety critical systems.
With the advent of plug-n-play RL libraries, its applicability has further increased, enabling integration of RL algorithms by users.
We note, however, that the majority of such code is not developed by RL engineers, which as a consequence, may lead to poor program quality yielding bugs, suboptimal performance, maintainability, and evolution problems for RL-based projects.
arXiv Detail & Related papers (2023-03-17T20:25:13Z) - Code Smells for Machine Learning Applications [6.759291241573661]
There is a lack of guidelines for code quality in machine learning applications.
This paper proposes and identifies a list of 22 machine learning-specific code smells collected from various sources.
We pinpoint each smell with a description of its context, potential issues in the long run, and proposed solutions.
arXiv Detail & Related papers (2022-03-25T16:23:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.