How Maintainable is Proficient Code? A Case Study of Three PyPI Libraries
- URL: http://arxiv.org/abs/2410.05683v2
- Date: Thu, 10 Oct 2024 08:50:59 GMT
- Title: How Maintainable is Proficient Code? A Case Study of Three PyPI Libraries
- Authors: Indira Febriyanti, Youmei Fan, Kazumasa Shimari, Kenichi Matsumoto, Raula Gaikovina Kula,
- Abstract summary: We investigate the risk level of proficient code inside a file.
We identify several instances of high proficient code that was also high risk.
We envision that the study should help developers identify scenarios where proficient code might be detrimental to future code maintenance activities.
- Score: 3.0105723746073
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Python is very popular because it can be used for a wider audience of developers, data scientists, machine learning experts and so on. Like other programming languages, there are beginner to advanced levels of writing Python code. However, like all software, code constantly needs to be maintained as bugs and the need for new features emerge. Although the Zen of Python states that "Simple is better than complex," we hypothesize that more elegant and proficient code might be harder for the developer to maintain. To study this relationship between the understanding of code maintainability and code proficiency, we present an exploratory study into the complexity of Python code on three Python libraries. Specifically, we investigate the risk level of proficient code inside a file. As a starting point, we mined and collected the proficiency of code from three PyPI libraries totaling 3,003 files. We identified several instances of high proficient code that was also high risk, with examples being simple list comprehensions, 'enumerate' calls, generator expressions, simple dictionary comprehensions, and the 'super' function. Our early examples revealed that most code-proficient development presented a low maintainability risk, yet there are some cases where proficient code is also risky to maintenance. We envision that the study should help developers identify scenarios where and when using proficient code might be detrimental to future code maintenance activities.
Related papers
- CRUXEval-X: A Benchmark for Multilingual Code Reasoning, Understanding and Execution [50.7413285637879]
The CRUXEVAL-X code reasoning benchmark contains 19 programming languages.
It comprises at least 600 subjects for each language, along with 19K content-consistent tests in total.
Even a model trained solely on Python can achieve at most 34.4% Pass@1 in other languages.
arXiv Detail & Related papers (2024-08-23T11:43:00Z) - Towards Identifying Code Proficiency through the Analysis of Python Textbooks [7.381102801726683]
The aim is to gauge the level of proficiency a developer must have to understand a piece of source code.
Prior attempts, which relied heavily on expert opinions and developer surveys, have led to considerable discrepancies.
This paper presents a new approach to identifying Python competency levels through the systematic analysis of introductory Python programming textbooks.
arXiv Detail & Related papers (2024-08-05T06:37:10Z) - A Comprehensive Guide to Combining R and Python code for Data Science, Machine Learning and Reinforcement Learning [42.350737545269105]
We show how to run Python's scikit-learn, pytorch and OpenAI gym libraries for building Machine Learning, Deep Learning, and Reinforcement Learning projects easily.
arXiv Detail & Related papers (2024-07-19T23:01:48Z) - Automatic Generation of Python Programs Using Context-Free Grammars [0.1227734309612871]
TinyPy Generator is a tool that generates random Python programs using a context-free grammar.
Our system uses custom production rules to generate code with different levels of complexity.
TinyPy Generator is useful in the field of machine learning, where it can generate substantial amounts of Python code for training Python language models.
arXiv Detail & Related papers (2024-03-11T08:25:52Z) - Causal-learn: Causal Discovery in Python [53.17423883919072]
Causal discovery aims at revealing causal relations from observational data.
$textitcausal-learn$ is an open-source Python library for causal discovery.
arXiv Detail & Related papers (2023-07-31T05:00:35Z) - Code Execution with Pre-trained Language Models [88.04688617516827]
Most pre-trained models for code intelligence ignore the execution trace and only rely on source code and syntactic structures.
We develop a mutation-based data augmentation technique to create a large-scale and realistic Python dataset and task for code execution.
We then present CodeExecutor, a Transformer model that leverages code execution pre-training and curriculum learning to enhance its semantic comprehension.
arXiv Detail & Related papers (2023-05-08T10:00:05Z) - Repairing Bugs in Python Assignments Using Large Language Models [9.973714032271708]
We propose to use a large language model trained on code to build an APR system for programming assignments.
Our system can fix both syntactic and semantic mistakes by combining multi-modal prompts, iterative querying, test-case-based selection of few-shots, and program chunking.
We evaluate MMAPR on 286 real student programs and compare to a baseline built by combining a state-of-the-art Python syntax repair engine, BIFI, and state-of-the-art Python semantic repair engine for student assignments, Refactory.
arXiv Detail & Related papers (2022-09-29T15:41:17Z) - PyGOD: A Python Library for Graph Outlier Detection [56.33769221859135]
PyGOD is an open-source library for detecting outliers in graph data.
It supports a wide array of leading graph-based methods for outlier detection.
PyGOD is released under a BSD 2-Clause license at https://pygod.org and at the Python Package Index (PyPI)
arXiv Detail & Related papers (2022-04-26T06:15:21Z) - ReACC: A Retrieval-Augmented Code Completion Framework [53.49707123661763]
We propose a retrieval-augmented code completion framework, leveraging both lexical copying and referring to code with similar semantics by retrieval.
We evaluate our approach in the code completion task in Python and Java programming languages, achieving a state-of-the-art performance on CodeXGLUE benchmark.
arXiv Detail & Related papers (2022-03-15T08:25:08Z) - fastai: A Layered API for Deep Learning [1.7223564681760164]
fastai is a deep learning library which provides practitioners with high-level components.
It provides researchers with low-level components that can be mixed and matched to build new approaches.
arXiv Detail & Related papers (2020-02-11T21:16:48Z) - OPFython: A Python-Inspired Optimum-Path Forest Classifier [68.8204255655161]
This paper proposes a Python-based Optimum-Path Forest framework, denoted as OPFython.
As OPFython is a Python-based library, it provides a more friendly environment and a faster prototyping workspace than the C language.
arXiv Detail & Related papers (2020-01-28T15:46:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.