DyPyBench: A Benchmark of Executable Python Software
- URL: http://arxiv.org/abs/2403.00539v1
- Date: Fri, 1 Mar 2024 13:53:15 GMT
- Title: DyPyBench: A Benchmark of Executable Python Software
- Authors: Islem Bouzenia, Bajaj Piyush Krishan, Michael Pradel
- Abstract summary: We present DyPyBench, the first benchmark of Python projects that is large scale, diverse, ready to run and ready to analyze.
The benchmark encompasses 50 popular opensource projects from various application domains, with a total of 681k lines of Python code, and 30k test cases.
We envision DyPyBench to provide a basis for other dynamic analyses and for studying the runtime behavior of Python code.
- Score: 18.129031749321058
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Python has emerged as one of the most popular programming languages,
extensively utilized in domains such as machine learning, data analysis, and
web applications. Python's dynamic nature and extensive usage make it an
attractive candidate for dynamic program analysis. However, unlike for other
popular languages, there currently is no comprehensive benchmark suite of
executable Python projects, which hinders the development of dynamic analyses.
This work addresses this gap by presenting DyPyBench, the first benchmark of
Python projects that is large scale, diverse, ready to run (i.e., with fully
configured and prepared test suites), and ready to analyze (by integrating with
the DynaPyt dynamic analysis framework). The benchmark encompasses 50 popular
opensource projects from various application domains, with a total of 681k
lines of Python code, and 30k test cases. DyPyBench enables various
applications in testing and dynamic analysis, of which we explore three in this
work: (i) Gathering dynamic call graphs and empirically comparing them to
statically computed call graphs, which exposes and quantifies limitations of
existing call graph construction techniques for Python. (ii) Using DyPyBench to
build a training data set for LExecutor, a neural model that learns to predict
values that otherwise would be missing at runtime. (iii) Using dynamically
gathered execution traces to mine API usage specifications, which establishes a
baseline for future work on specification mining for Python. We envision
DyPyBench to provide a basis for other dynamic analyses and for studying the
runtime behavior of Python code.
Related papers
- PoTo: A Hybrid Andersen's Points-to Analysis for Python [3.6793233203143743]
PoTo is an Andersen-style context-insensitive and flow-insensitive points-to analysis for Python.
PoTo+ is a static type inference for Python built on the points-to analysis.
arXiv Detail & Related papers (2024-09-05T21:26:25Z) - PyBench: Evaluating LLM Agent on various real-world coding tasks [13.347173063163138]
PyBench is a benchmark covering five main categories of real-world tasks, covering more than 10 types of files.
Our evaluations indicate that current open-source LLMs are struggling with these tasks.
Our fine-tuned 8B size model: textbfPyLlama3 achieves an exciting performance on PyBench.
arXiv Detail & Related papers (2024-07-23T15:23:14Z) - Python is Not Always the Best Choice: Embracing Multilingual Program of Thoughts [51.49688654641581]
We propose a task and model agnostic approach called MultiPoT, which harnesses strength and diversity from various languages.
Experimental results reveal that it significantly outperforms Python Self-Consistency.
In particular, MultiPoT achieves more than 4.6% improvement on average on ChatGPT (gpt-3.5-turbo-0701)
arXiv Detail & Related papers (2024-02-16T13:48:06Z) - A Static Evaluation of Code Completion by Large Language Models [65.18008807383816]
Execution-based benchmarks have been proposed to evaluate functional correctness of model-generated code on simple programming problems.
static analysis tools such as linters, which can detect errors without running the program, haven't been well explored for evaluating code generation models.
We propose a static evaluation framework to quantify static errors in Python code completions, by leveraging Abstract Syntax Trees.
arXiv Detail & Related papers (2023-06-05T19:23:34Z) - Scalable and Precise Application-Centered Call Graph Construction for Python [4.655332013331494]
PyCG is the state-of-the-art approach for constructing call graphs for Python programs.
We propose a scalable and precise approach for constructing application-centered call graphs for Python programs, and implement it as a prototype tool JARVIS.
Taking one function as an input, JARVIS generates the call graph on-the-fly, where flow-sensitive intra-procedural analysis and inter-procedural analysis are conducted.
arXiv Detail & Related papers (2023-05-10T07:40:05Z) - Serenity: Library Based Python Code Analysis for Code Completion and
Automated Machine Learning [8.362734311902278]
We present a framework for static analysis of Python that turns out to be sufficient for some tasks.
Serenity exploits two basic mechanisms: (a) reliance on dynamic dispatch at the core of language translation, and (b) extreme abstraction of libraries.
We demonstrate the efficiency and usefulness of Serenity's analysis in two applications: code completion and automated machine learning.
arXiv Detail & Related papers (2023-01-05T02:09:08Z) - DADApy: Distance-based Analysis of DAta-manifolds in Python [51.37841707191944]
DADApy is a python software package for analysing and characterising high-dimensional data.
It provides methods for estimating the intrinsic dimension and the probability density, for performing density-based clustering and for comparing different distance metrics.
arXiv Detail & Related papers (2022-05-04T08:41:59Z) - PyGOD: A Python Library for Graph Outlier Detection [56.33769221859135]
PyGOD is an open-source library for detecting outliers in graph data.
It supports a wide array of leading graph-based methods for outlier detection.
PyGOD is released under a BSD 2-Clause license at https://pygod.org and at the Python Package Index (PyPI)
arXiv Detail & Related papers (2022-04-26T06:15:21Z) - OMB-Py: Python Micro-Benchmarks for Evaluating Performance of MPI
Libraries on HPC Systems [1.066106854070245]
OMB-Py is the first communication benchmark suite for parallel Python applications.
OMB-Py consists of a variety of point-to-point and collective communication benchmark tests.
We report up to 106x speedup on 224 CPU cores compared to sequential execution.
arXiv Detail & Related papers (2021-10-20T16:59:14Z) - PyODDS: An End-to-end Outlier Detection System with Automated Machine
Learning [55.32009000204512]
We present PyODDS, an automated end-to-end Python system for Outlier Detection with Database Support.
Specifically, we define the search space in the outlier detection pipeline, and produce a search strategy within the given search space.
It also provides unified interfaces and visualizations for users with or without data science or machine learning background.
arXiv Detail & Related papers (2020-03-12T03:30:30Z) - OPFython: A Python-Inspired Optimum-Path Forest Classifier [68.8204255655161]
This paper proposes a Python-based Optimum-Path Forest framework, denoted as OPFython.
As OPFython is a Python-based library, it provides a more friendly environment and a faster prototyping workspace than the C language.
arXiv Detail & Related papers (2020-01-28T15:46:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.