pyMethods2Test: A Dataset of Python Tests Mapped to Focal Methods
- URL: http://arxiv.org/abs/2502.05143v1
- Date: Fri, 07 Feb 2025 18:19:12 GMT
- Title: pyMethods2Test: A Dataset of Python Tests Mapped to Focal Methods
- Authors: Idriss Abdelmadjid, Robert Dyer,
- Abstract summary: Python is one of the fastest-growing programming languages and currently ranks as the top language in many lists.
It is imperative to be able to effectively train LLMs to generate good unit test cases for Python code.
This motivates the need for a large dataset to provide training and testing data.
- Score: 0.21485350418225244
- License:
- Abstract: Python is one of the fastest-growing programming languages and currently ranks as the top language in many lists, even recently overtaking JavaScript as the top language on GitHub. Given its importance in data science and machine learning, it is imperative to be able to effectively train LLMs to generate good unit test cases for Python code. This motivates the need for a large dataset to provide training and testing data. To date, while other large datasets exist for languages like Java, none publicly exist for Python. Python poses difficult challenges in generating such a dataset, due to its less rigid naming requirements. In this work, we consider two commonly used Python unit testing frameworks: Pytest and unittest. We analyze a large corpus of over 88K open-source GitHub projects utilizing these testing frameworks. Using a carefully designed set of heuristics, we are able to locate over 22 million test methods. We then analyze the test and non-test code and map individual unit tests to the focal method being tested. This provides an explicit traceability link from the test to the tested method. Our pyMethods2Test dataset contains over 2 million of these focal method mappings, as well as the ability to generate useful context for input to LLMs. The pyMethods2Test dataset is publicly available on Zenodo at: https://doi.org/10.5281/zenodo.14264518
Related papers
- PyPulse: A Python Library for Biosignal Imputation [58.35269251730328]
We introduce PyPulse, a Python package for imputation of biosignals in both clinical and wearable sensor settings.
PyPulse's framework provides a modular and extendable framework with high ease-of-use for a broad userbase, including non-machine-learning bioresearchers.
We released PyPulse under the MIT License on Github and PyPI.
arXiv Detail & Related papers (2024-12-09T11:00:55Z) - CRUXEval-X: A Benchmark for Multilingual Code Reasoning, Understanding and Execution [50.7413285637879]
The CRUXEVAL-X code reasoning benchmark contains 19 programming languages.
It comprises at least 600 subjects for each language, along with 19K content-consistent tests in total.
Even a model trained solely on Python can achieve at most 34.4% Pass@1 in other languages.
arXiv Detail & Related papers (2024-08-23T11:43:00Z) - TESTEVAL: Benchmarking Large Language Models for Test Case Generation [15.343859279282848]
We propose TESTEVAL, a novel benchmark for test case generation with large language models (LLMs)
We collect 210 Python programs from an online programming platform, LeetCode, and design three different tasks: overall coverage, targeted line/branch coverage, and targeted path coverage.
We find that generating test cases to cover specific program lines/branches/paths is still challenging for current LLMs.
arXiv Detail & Related papers (2024-06-06T22:07:50Z) - Python is Not Always the Best Choice: Embracing Multilingual Program of Thoughts [51.49688654641581]
We propose a task and model agnostic approach called MultiPoT, which harnesses strength and diversity from various languages.
Experimental results reveal that it significantly outperforms Python Self-Consistency.
In particular, MultiPoT achieves more than 4.6% improvement on average on ChatGPT (gpt-3.5-turbo-0701)
arXiv Detail & Related papers (2024-02-16T13:48:06Z) - BugsInPy: A Database of Existing Bugs in Python Programs to Enable
Controlled Testing and Debugging Studies [8.746971239693066]
For the first time, Python outperformed Java in Stack Overflow developer survey.
This is in stark contrast with the abundance of testing and debug tools for Java.
In this project, we create a benchmark database and tool that contain 493 real bugs from 17 real-world Python programs.
arXiv Detail & Related papers (2024-01-27T19:07:34Z) - PyTester: Deep Reinforcement Learning for Text-to-Testcase Generation [20.441921569948562]
Test-driven development (TDD) mandates writing test cases based on requirements before writing the actual code.
While writing test cases is the centerpiece of TDD, it is time-consuming, expensive, and often shunned by developers.
We introduce PyTester, a Text-to-Testcase generation approach that can automatically generate correct, executable, complete, and effective test cases.
arXiv Detail & Related papers (2024-01-15T10:21:58Z) - The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants [80.4837840962273]
We present Belebele, a dataset spanning 122 language variants.
This dataset enables the evaluation of text models in high-, medium-, and low-resource languages.
arXiv Detail & Related papers (2023-08-31T17:43:08Z) - Automatic Unit Test Generation for Deep Learning Frameworks based on API
Knowledge [11.523398693942413]
We propose MUTester to generate unit test cases for APIs of deep learning frameworks.
We first propose a set of 18 rules for mining API constraints from the API documents.
We then use the frequent itemset mining technique to mine the API usage patterns from a large corpus of machine learning API related code fragments.
arXiv Detail & Related papers (2023-07-01T18:34:56Z) - pytest-inline: An Inline Testing Tool for Python [10.307253336106053]
pytest-inline is a plugin for pytest, the most popular Python testing framework.
pytest-inline runs each inline test and fails if the target statement's output does not match the expected output.
pytest-inline is integrated into the pytest-dev organization.
arXiv Detail & Related papers (2023-05-22T20:58:44Z) - TextBox 2.0: A Text Generation Library with Pre-trained Language Models [72.49946755856935]
This paper presents a comprehensive and unified library, TextBox 2.0, focusing on the use of pre-trained language models (PLMs)
To be comprehensive, our library covers $13$ common text generation tasks and their corresponding $83$ datasets.
We also implement $4$ efficient training strategies and provide $4$ generation objectives for pre-training new PLMs from scratch.
arXiv Detail & Related papers (2022-12-26T03:50:36Z) - PyGOD: A Python Library for Graph Outlier Detection [56.33769221859135]
PyGOD is an open-source library for detecting outliers in graph data.
It supports a wide array of leading graph-based methods for outlier detection.
PyGOD is released under a BSD 2-Clause license at https://pygod.org and at the Python Package Index (PyPI)
arXiv Detail & Related papers (2022-04-26T06:15:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.