An Empirical Study of Fault Localization in Python Programs
- URL: http://arxiv.org/abs/2305.19834v3
- Date: Wed, 20 Mar 2024 17:45:19 GMT
- Title: An Empirical Study of Fault Localization in Python Programs
- Authors: Mohammad Rezaalipour, Carlo A. Furia,
- Abstract summary: This paper is the first multi-family large-scale empirical study of fault localization on real-world Python programs and faults.
We use Zou et al.'s recent large-scale empirical study of fault localization in Java as the basis of our study.
The results replicate for Python several results known about Java, and shed light on whether Python's peculiarities affect the capabilities of fault localization.
- Score: 4.366130138560774
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite its massive popularity as a programming language, especially in novel domains like data science programs, there is comparatively little research about fault localization that targets Python. Even though it is plausible that several findings about programming languages like C/C++ and Java -- the most common choices for fault localization research -- carry over to other languages, whether the dynamic nature of Python and how the language is used in practice affect the capabilities of classic fault localization approaches remain open questions to investigate. This paper is the first multi-family large-scale empirical study of fault localization on real-world Python programs and faults. Using Zou et al.'s recent large-scale empirical study of fault localization in Java as the basis of our study, we investigated the effectiveness (i.e., localization accuracy), efficiency (i.e., runtime performance), and other features (e.g., different entity granularities) of seven well-known fault-localization techniques in four families (spectrum-based, mutation-based, predicate switching, and stack-trace based) on 135 faults from 13 open-source Python projects from the BugsInPy curated collection. The results replicate for Python several results known about Java, and shed light on whether Python's peculiarities affect the capabilities of fault localization. The replication package that accompanies this paper includes detailed data about our experiments, as well as the tool FauxPy that we implemented to conduct the study.
Related papers
- Gotta catch 'em all! Towards File Localisation from Issues at Large [2.1574657220935602]
This work provides a data pipeline for the creation of issue file localisation datasets.<n>We provide a baseline performance evaluation for the file localisation problem using traditional information retrieval approaches.<n>We use statistical analysis to investigate the influence of biases known in the bug localisation community on our dataset.
arXiv Detail & Related papers (2025-07-24T11:42:13Z) - Bugs in the Shadows: Static Detection of Faulty Python Refactorings [44.115219601924856]
Python's dynamic type system poses significant challenges for automated code transformations.<n>Our analysis uncovered 29 bugs across four types from a total of 1,152 attempts.<n>These results highlight the need to improve the robustness of current Python tools to ensure the correctness of automated code transformations.
arXiv Detail & Related papers (2025-07-01T18:03:56Z) - PyResBugs: A Dataset of Residual Python Bugs for Natural Language-Driven Fault Injection [5.383910843560784]
PyResBugs is a curated dataset of residual bugs from major Python frameworks.<n>Each bug is paired with its corresponding fault-free (fixed) version and annotated with multi-level natural language (NL) descriptions.
arXiv Detail & Related papers (2025-05-09T04:39:09Z) - Evaluation of the Code Generation Capabilities of ChatGPT 4: A Comparative Analysis in 19 Programming Languages [0.0]
This thesis examines the capabilities of ChatGPT 4 in code generation across 19 programming languages.
ChatGPT 4 successfully solved 39.67% of all tasks, with success rates decreasing significantly as problem complexity increased.
The model exhibited above-average runtime efficiency in all programming languages.
arXiv Detail & Related papers (2025-01-04T17:17:01Z) - CRUXEval-X: A Benchmark for Multilingual Code Reasoning, Understanding and Execution [50.7413285637879]
The CRUXEVAL-X code reasoning benchmark contains 19 programming languages.
It comprises at least 600 subjects for each language, along with 19K content-consistent tests in total.
Even a model trained solely on Python can achieve at most 34.4% Pass@1 in other languages.
arXiv Detail & Related papers (2024-08-23T11:43:00Z) - Code Linting using Language Models [0.7519872646378836]
Code linters play a crucial role in developing high-quality software systems.
Despite their benefits, code linters are often language-specific, focused on certain types of issues, and prone to false positives.
This paper investigates whether large language models can be used to develop a more versatile code linter.
arXiv Detail & Related papers (2024-06-27T19:59:49Z) - FauxPy: A Fault Localization Tool for Python [4.366130138560774]
FauxPy is a fault localization tool for Python programs.
The paper showcases how to use FauxPy on two illustrative examples, and then discusses its main features and capabilities from a user's perspective.
arXiv Detail & Related papers (2024-04-29T11:11:26Z) - Python is Not Always the Best Choice: Embracing Multilingual Program of Thoughts [51.49688654641581]
We propose a task and model agnostic approach called MultiPoT, which harnesses strength and diversity from various languages.
Experimental results reveal that it significantly outperforms Python Self-Consistency.
In particular, MultiPoT achieves more than 4.6% improvement on average on ChatGPT (gpt-3.5-turbo-0701)
arXiv Detail & Related papers (2024-02-16T13:48:06Z) - Causal-learn: Causal Discovery in Python [53.17423883919072]
Causal discovery aims at revealing causal relations from observational data.
$textitcausal-learn$ is an open-source Python library for causal discovery.
arXiv Detail & Related papers (2023-07-31T05:00:35Z) - A Static Evaluation of Code Completion by Large Language Models [65.18008807383816]
Execution-based benchmarks have been proposed to evaluate functional correctness of model-generated code on simple programming problems.
static analysis tools such as linters, which can detect errors without running the program, haven't been well explored for evaluating code generation models.
We propose a static evaluation framework to quantify static errors in Python code completions, by leveraging Abstract Syntax Trees.
arXiv Detail & Related papers (2023-06-05T19:23:34Z) - Measuring The Impact Of Programming Language Distribution [28.96076723773365]
We present the BabelCode framework for execution-based evaluation of any benchmark in any language.
We present a new code translation dataset called Translating Python Programming Puzzles (TP3)
We investigate if balancing the distributions of 14 languages in a training dataset improves a large language model's performance on low-resource languages.
arXiv Detail & Related papers (2023-02-03T19:47:22Z) - BigIssue: A Realistic Bug Localization Benchmark [89.8240118116093]
BigIssue is a benchmark for realistic bug localization.
We provide a general benchmark with a diversity of real and synthetic Java bugs.
We hope to advance the state of the art in bug localization, in turn improving APR performance and increasing its applicability to the modern development cycle.
arXiv Detail & Related papers (2022-07-21T20:17:53Z) - Python for Smarter Cities: Comparison of Python libraries for static and
interactive visualisations of large vector data [0.0]
Python, with its concise and natural syntax, presents a low barrier to entry for municipal staff without computer science backgrounds.
This study assesses prominent, actively-developed visualisation libraries in the Python ecosystem with respect to producing visualisations of large vector datasets.
All short-listed libraries were able to generate the sample map products for both a small and larger dataset.
arXiv Detail & Related papers (2022-02-26T10:23:29Z) - OPFython: A Python-Inspired Optimum-Path Forest Classifier [68.8204255655161]
This paper proposes a Python-based Optimum-Path Forest framework, denoted as OPFython.
As OPFython is a Python-based library, it provides a more friendly environment and a faster prototyping workspace than the C language.
arXiv Detail & Related papers (2020-01-28T15:46:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.