Related papers: DyPyBench: A Benchmark of Executable Python Software

DyPyBench: A Benchmark of Executable Python Software

URL: http://arxiv.org/abs/2403.00539v1
Date: Fri, 1 Mar 2024 13:53:15 GMT
Title: DyPyBench: A Benchmark of Executable Python Software
Authors: Islem Bouzenia, Bajaj Piyush Krishan, Michael Pradel
Abstract summary: We present DyPyBench, the first benchmark of Python projects that is large scale, diverse, ready to run and ready to analyze. The benchmark encompasses 50 popular opensource projects from various application domains, with a total of 681k lines of Python code, and 30k test cases. We envision DyPyBench to provide a basis for other dynamic analyses and for studying the runtime behavior of Python code.
Score: 18.129031749321058
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Python has emerged as one of the most popular programming languages, extensively utilized in domains such as machine learning, data analysis, and web applications. Python's dynamic nature and extensive usage make it an attractive candidate for dynamic program analysis. However, unlike for other popular languages, there currently is no comprehensive benchmark suite of executable Python projects, which hinders the development of dynamic analyses. This work addresses this gap by presenting DyPyBench, the first benchmark of Python projects that is large scale, diverse, ready to run (i.e., with fully configured and prepared test suites), and ready to analyze (by integrating with the DynaPyt dynamic analysis framework). The benchmark encompasses 50 popular opensource projects from various application domains, with a total of 681k lines of Python code, and 30k test cases. DyPyBench enables various applications in testing and dynamic analysis, of which we explore three in this work: (i) Gathering dynamic call graphs and empirically comparing them to statically computed call graphs, which exposes and quantifies limitations of existing call graph construction techniques for Python. (ii) Using DyPyBench to build a training data set for LExecutor, a neural model that learns to predict values that otherwise would be missing at runtime. (iii) Using dynamically gathered execution traces to mine API usage specifications, which establishes a baseline for future work on specification mining for Python. We envision DyPyBench to provide a basis for other dynamic analyses and for studying the runtime behavior of Python code.

Related papers

PoTo: A Hybrid Andersen's Points-to Analysis for Python [3.6793233203143743]
PoTo is an Andersen-style context-insensitive and flow-insensitive points-to analysis for Python. PoTo+ is a static type inference for Python built on the points-to analysis.
arXiv Detail & Related papers (2024-09-05T21:26:25Z)
PyBench: Evaluating LLM Agent on various real-world coding tasks [13.347173063163138]
PyBench is a benchmark covering five main categories of real-world tasks, covering more than 10 types of files. Our evaluations indicate that current open-source LLMs are struggling with these tasks. Our fine-tuned 8B size model: textbfPyLlama3 achieves an exciting performance on PyBench.
arXiv Detail & Related papers (2024-07-23T15:23:14Z)
Python is Not Always the Best Choice: Embracing Multilingual Program of Thoughts [51.49688654641581]
We propose a task and model agnostic approach called MultiPoT, which harnesses strength and diversity from various languages. Experimental results reveal that it significantly outperforms Python Self-Consistency. In particular, MultiPoT achieves more than 4.6% improvement on average on ChatGPT (gpt-3.5-turbo-0701)
arXiv Detail & Related papers (2024-02-16T13:48:06Z)
A Static Evaluation of Code Completion by Large Language Models [65.18008807383816]
Execution-based benchmarks have been proposed to evaluate functional correctness of model-generated code on simple programming problems. static analysis tools such as linters, which can detect errors without running the program, haven't been well explored for evaluating code generation models. We propose a static evaluation framework to quantify static errors in Python code completions, by leveraging Abstract Syntax Trees.
arXiv Detail & Related papers (2023-06-05T19:23:34Z)
Scalable and Precise Application-Centered Call Graph Construction for Python [4.655332013331494]
PyCG is the state-of-the-art approach for constructing call graphs for Python programs. We propose a scalable and precise approach for constructing application-centered call graphs for Python programs, and implement it as a prototype tool JARVIS. Taking one function as an input, JARVIS generates the call graph on-the-fly, where flow-sensitive intra-procedural analysis and inter-procedural analysis are conducted.
arXiv Detail & Related papers (2023-05-10T07:40:05Z)
Serenity: Library Based Python Code Analysis for Code Completion and Automated Machine Learning [8.362734311902278]
We present a framework for static analysis of Python that turns out to be sufficient for some tasks. Serenity exploits two basic mechanisms: (a) reliance on dynamic dispatch at the core of language translation, and (b) extreme abstraction of libraries. We demonstrate the efficiency and usefulness of Serenity's analysis in two applications: code completion and automated machine learning.
arXiv Detail & Related papers (2023-01-05T02:09:08Z)
DADApy: Distance-based Analysis of DAta-manifolds in Python [51.37841707191944]
DADApy is a python software package for analysing and characterising high-dimensional data. It provides methods for estimating the intrinsic dimension and the probability density, for performing density-based clustering and for comparing different distance metrics.
arXiv Detail & Related papers (2022-05-04T08:41:59Z)
PyGOD: A Python Library for Graph Outlier Detection [56.33769221859135]
PyGOD is an open-source library for detecting outliers in graph data. It supports a wide array of leading graph-based methods for outlier detection. PyGOD is released under a BSD 2-Clause license at https://pygod.org and at the Python Package Index (PyPI)
arXiv Detail & Related papers (2022-04-26T06:15:21Z)
OMB-Py: Python Micro-Benchmarks for Evaluating Performance of MPI Libraries on HPC Systems [1.066106854070245]
OMB-Py is the first communication benchmark suite for parallel Python applications. OMB-Py consists of a variety of point-to-point and collective communication benchmark tests. We report up to 106x speedup on 224 CPU cores compared to sequential execution.
arXiv Detail & Related papers (2021-10-20T16:59:14Z)
PyODDS: An End-to-end Outlier Detection System with Automated Machine Learning [55.32009000204512]
We present PyODDS, an automated end-to-end Python system for Outlier Detection with Database Support. Specifically, we define the search space in the outlier detection pipeline, and produce a search strategy within the given search space. It also provides unified interfaces and visualizations for users with or without data science or machine learning background.
arXiv Detail & Related papers (2020-03-12T03:30:30Z)
OPFython: A Python-Inspired Optimum-Path Forest Classifier [68.8204255655161]
This paper proposes a Python-based Optimum-Path Forest framework, denoted as OPFython. As OPFython is a Python-based library, it provides a more friendly environment and a faster prototyping workspace than the C language.
arXiv Detail & Related papers (2020-01-28T15:46:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.