Managing Software Provenance to Enhance Reproducibility in Computational
Research
- URL: http://arxiv.org/abs/2308.15637v2
- Date: Wed, 20 Dec 2023 02:21:59 GMT
- Title: Managing Software Provenance to Enhance Reproducibility in Computational
Research
- Authors: Akash Dhruv, Anshu Dubey
- Abstract summary: Management of computation-based scientific studies is often left to individual researchers who design their experiments based on personal preferences and the nature of the study.
We believe that the quality, efficiency, and of computation-based scientific research can be improved by explicitly creating an execution environment that allows researchers to provide a clear record of traceability.
- Score: 1.1421942894219899
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Scientific processes rely on software as an important tool for data
acquisition, analysis, and discovery. Over the years sustainable software
development practices have made progress in being considered as an integral
component of research. However, management of computation-based scientific
studies is often left to individual researchers who design their computational
experiments based on personal preferences and the nature of the study. We
believe that the quality, efficiency, and reproducibility of computation-based
scientific research can be improved by explicitly creating an execution
environment that allows researchers to provide a clear record of traceability.
This is particularly relevant to complex computational studies in
high-performance computing (HPC) environments. In this article, we review the
documentation required to maintain a comprehensive record of HPC computational
experiments for reproducibility. We also provide an overview of tools and
practices that we have developed to perform such studies around Flash-X, a
multi-physics scientific software.
Related papers
- MASSW: A New Dataset and Benchmark Tasks for AI-Assisted Scientific Workflows [58.56005277371235]
We introduce MASSW, a comprehensive text dataset on Multi-Aspect Summarization of ScientificAspects.
MASSW includes more than 152,000 peer-reviewed publications from 17 leading computer science conferences spanning the past 50 years.
We demonstrate the utility of MASSW through multiple novel machine-learning tasks that can be benchmarked using this new dataset.
arXiv Detail & Related papers (2024-06-10T15:19:09Z) - MLXP: A Framework for Conducting Replicable Experiments in Python [63.37350735954699]
We propose MLXP, an open-source, simple, and lightweight experiment management tool based on Python.
It streamlines the experimental process with minimal overhead while ensuring a high level of practitioner overhead.
arXiv Detail & Related papers (2024-02-21T14:22:20Z) - Ten simple rules for teaching sustainable software engineering [0.0]
Developing high-quality research software requires scientists to develop a host of software development skills.
There has been a growing importance placed on ensuring foundational and good development practices in computational research.
Recent articles in the Ten Simple Rules collection have discussed the teaching of computer science and coding techniques to biology students.
We advance this discussion by describing the specific steps for effectively teaching the necessary skills scientists need to develop sustainable software packages.
arXiv Detail & Related papers (2024-02-07T10:16:20Z) - SciOps: Achieving Productivity and Reliability in Data-Intensive Research [0.8414742293641504]
Scientists are increasingly leveraging advances in instruments, automation, and collaborative tools to scale up their experiments and research goals.
Various scientific disciplines, including neuroscience, have adopted key technologies to enhance collaboration, inspiration and automation.
We introduce a five-level Capability Maturity Model describing the principles of rigorous scientific operations.
arXiv Detail & Related papers (2023-12-29T21:37:22Z) - A pragmatic workflow for research software engineering in computational
science [0.0]
University research groups in Computational Science and Engineering (CSE) generally lack dedicated funding and personnel for Research Software Engineering (RSE)
RSE shifts the focus away from sustainable research software development and reproducible results.
We propose a RSE workflow for CSE that addresses these challenges, that improves the quality of research output in CSE.
arXiv Detail & Related papers (2023-10-02T08:04:12Z) - Using Machine Learning To Identify Software Weaknesses From Software
Requirement Specifications [49.1574468325115]
This research focuses on finding an efficient machine learning algorithm to identify software weaknesses from requirement specifications.
Keywords extracted using latent semantic analysis help map the CWE categories to PROMISE_exp. Naive Bayes, support vector machine (SVM), decision trees, neural network, and convolutional neural network (CNN) algorithms were tested.
arXiv Detail & Related papers (2023-08-10T13:19:10Z) - A Metadata-Based Ecosystem to Improve the FAIRness of Research Software [0.3185506103768896]
The reuse of research software is central to research efficiency and academic exchange.
The DataDesc ecosystem is presented, an approach to describing data models of software interfaces with detailed and machine-actionable metadata.
arXiv Detail & Related papers (2023-06-18T19:01:08Z) - PyExperimenter: Easily distribute experiments and track results [63.871474825689134]
PyExperimenter is a tool to facilitate the setup, documentation, execution, and subsequent evaluation of results from an empirical study of algorithms.
It is intended to be used by researchers in the field of artificial intelligence, but is not limited to those.
arXiv Detail & Related papers (2023-01-16T10:43:02Z) - Caching and Reproducibility: Making Data Science experiments faster and
FAIRer [25.91002326340444]
Small to medium-scale data science experiments often rely on research software developed ad-hoc by individual scientists or small teams.
We suggest making caching an integral part of the research software development process, even before the first line of code is written.
arXiv Detail & Related papers (2022-11-08T07:11:02Z) - Research Trends and Applications of Data Augmentation Algorithms [77.34726150561087]
We identify the main areas of application of data augmentation algorithms, the types of algorithms used, significant research trends, their progression over time and research gaps in data augmentation literature.
We expect readers to understand the potential of data augmentation, as well as identify future research directions and open questions within data augmentation research.
arXiv Detail & Related papers (2022-07-18T11:38:32Z) - Distributed intelligence on the Edge-to-Cloud Continuum: A systematic
literature review [62.997667081978825]
This review aims at providing a comprehensive vision of the main state-of-the-art libraries and frameworks for machine learning and data analytics available today.
The main simulation, emulation, deployment systems, and testbeds for experimental research on the Edge-to-Cloud Continuum available today are also surveyed.
arXiv Detail & Related papers (2022-04-29T08:06:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.