PyPackIT: Automated Research Software Engineering for Scientific Python Applications on GitHub
- URL: http://arxiv.org/abs/2503.04921v1
- Date: Thu, 06 Mar 2025 19:41:55 GMT
- Title: PyPackIT: Automated Research Software Engineering for Scientific Python Applications on GitHub
- Authors: Armin Ariamajd, Raquel López-Ríos de Castro, Andrea Volkamer,
- Abstract summary: PyPackIT is a user-friendly, ready-to-use software that enables scientists to focus on the scientific aspects of their projects.<n> PyPackIT offers a robust project infrastructure including a build-ready Python package skeleton, a fully operational documentation and test suite, and a control center for dynamic project management.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The increasing importance of Computational Science and Engineering has highlighted the need for high-quality scientific software. However, research software development is often hindered by limited funding, time, staffing, and technical resources. To address these challenges, we introduce PyPackIT, a cloud-based automation tool designed to streamline research software engineering in accordance with FAIR (Findable, Accessible, Interoperable, and Reusable) and Open Science principles. PyPackIT is a user-friendly, ready-to-use software that enables scientists to focus on the scientific aspects of their projects while automating repetitive tasks and enforcing best practices throughout the software development life cycle. Using modern Continuous software engineering and DevOps methodologies, PyPackIT offers a robust project infrastructure including a build-ready Python package skeleton, a fully operational documentation and test suite, and a control center for dynamic project management and customization. PyPackIT integrates seamlessly with GitHub's version control system, issue tracker, and pull-based model to establish a fully-automated software development workflow. Exploiting GitHub Actions, PyPackIT provides a cloud-native Agile development environment using containerization, Configuration-as-Code, and Continuous Integration, Deployment, Testing, Refactoring, and Maintenance pipelines. PyPackIT is an open-source software suite that seamlessly integrates with both new and existing projects via a public GitHub repository template at https://github.com/repodynamics/pypackit.
Related papers
- EnvBench: A Benchmark for Automated Environment Setup [76.02998475135824]
Large Language Models have enabled researchers to focus on practical repository-level tasks in software engineering domain.
Existing studies on environment setup introduce innovative agentic strategies, but their evaluation is often based on small datasets.
To address this gap, we introduce a comprehensive environment setup benchmark EnvBench.
arXiv Detail & Related papers (2025-03-18T17:19:12Z) - PyPulse: A Python Library for Biosignal Imputation [58.35269251730328]
We introduce PyPulse, a Python package for imputation of biosignals in both clinical and wearable sensor settings.<n>PyPulse's framework provides a modular and extendable framework with high ease-of-use for a broad userbase, including non-machine-learning bioresearchers.<n>We released PyPulse under the MIT License on Github and PyPI.
arXiv Detail & Related papers (2024-12-09T11:00:55Z) - PyGen: A Collaborative Human-AI Approach to Python Package Creation [1.3348326328808557]
Pygen is an automation platform designed to empower researchers, technologists, and hobbyists to bring abstract ideas to life as core, usable software tools written in Python.<n>By combining state-of-the-art language models with open-source code generation technologies, Pygen has significantly reduced the manual overhead of tool development.
arXiv Detail & Related papers (2024-11-13T03:16:18Z) - RepoGraph: Enhancing AI Software Engineering with Repository-level Code Graph [63.87660059104077]
We present RepoGraph, a plug-in module that manages a repository-level structure for modern AI software engineering solutions.
RepoGraph substantially boosts the performance of all systems, leading to a new state-of-the-art among open-source frameworks.
arXiv Detail & Related papers (2024-10-03T05:45:26Z) - Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows? [73.81908518992161]
We introduce Spider2-V, the first multimodal agent benchmark focusing on professional data science and engineering.
Spider2-V features real-world tasks in authentic computer environments and incorporating 20 enterprise-level professional applications.
These tasks evaluate the ability of a multimodal agent to perform data-related tasks by writing code and managing the GUI in enterprise data software systems.
arXiv Detail & Related papers (2024-07-15T17:54:37Z) - SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering [79.07755560048388]
SWE-agent is a system that facilitates LM agents to autonomously use computers to solve software engineering tasks.
SWE-agent's custom agent-computer interface (ACI) significantly enhances an agent's ability to create and edit code files, navigate entire repositories, and execute tests and other programs.
We evaluate SWE-agent on SWE-bench and HumanEvalFix, achieving state-of-the-art performance on both with a pass@1 rate of 12.5% and 87.7%, respectively.
arXiv Detail & Related papers (2024-05-06T17:41:33Z) - Python Fuzzing for Trustworthy Machine Learning Frameworks [0.0]
We propose a dynamic analysis pipeline for Python projects using Sydr-Fuzz.<n>Our pipeline includes fuzzing, corpus minimization, crash triaging, and coverage collection.<n>To identify the most vulnerable parts of machine learning frameworks, we analyze their potential attack surfaces and develop fuzz targets for PyTorch, and related projects such as h5py.
arXiv Detail & Related papers (2024-03-19T13:41:11Z) - PyPOTS: A Python Toolbox for Data Mining on Partially-Observed Time
Series [0.0]
PyPOTS is an open-source Python library dedicated to data mining and analysis on partially-observed time series.
It provides easy access to diverse algorithms categorized into four tasks: imputation, classification, clustering, and forecasting.
arXiv Detail & Related papers (2023-05-30T07:57:05Z) - The GitHub Development Workflow Automation Ecosystems [47.818229204130596]
Large-scale software development has become a highly collaborative endeavour.
This chapter explores the ecosystems of development bots and GitHub Actions.
It provides an extensive survey of the state-of-the-art in this domain.
arXiv Detail & Related papers (2023-05-08T15:24:23Z) - Tangelo: An Open-source Python Package for End-to-end Chemistry
Workflows on Quantum Computers [85.21205677945196]
Tangelo is an open-source Python software package for the development of end-to-end chemistry on quantum computers.
It aims to support the design of successful experiments on quantum hardware, and to facilitate advances in quantum algorithm development.
arXiv Detail & Related papers (2022-06-24T17:44:00Z) - pyWATTS: Python Workflow Automation Tool for Time Series [0.20315704654772418]
pyWATTS is a non-sequential workflow automation tool for the analysis of time series data.
pyWATTS includes modules with clearly defined interfaces to enable seamless integration of new or existing methods.
pyWATTS supports key Python machine learning libraries such as scikit-learn, PyTorch, and Keras.
arXiv Detail & Related papers (2021-06-18T14:50:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.