Biomedical Open Source Software: Crucial Packages and Hidden Heroes
- URL: http://arxiv.org/abs/2404.06672v1
- Date: Wed, 10 Apr 2024 01:22:02 GMT
- Title: Biomedical Open Source Software: Crucial Packages and Hidden Heroes
- Authors: Andrew Nesbitt, Boris Veytsman, Daniel Mietchen, Eva Maxfield Brown, James Howison, João Felipe Pimentel, Laurent Hèbert-Dufresne, Stephan Druskat,
- Abstract summary: We map the dependencies of the software used in biomedical papers and find the packages critical to the software ecosystems.
We propose the centrality metrics for the network of software dependencies, analyze three ecosystems (PyPi, CRAN, Bioconductor) and determine the packages with the highest centrality.
- Score: 2.3960586265742574
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite the importance of scientific software for research, it is often not formally recognized and rewarded. This is especially true for foundation libraries, which are used by the software packages visible to the users, being ``hidden'' themselves. The funders and other organizations need to understand the complex network of computer programs that the modern research relies upon. In this work we used CZ Software Mentions Dataset to map the dependencies of the software used in biomedical papers and find the packages critical to the software ecosystems. We propose the centrality metrics for the network of software dependencies, analyze three ecosystems (PyPi, CRAN, Bioconductor) and determine the packages with the highest centrality.
Related papers
- Estimating the Energy Footprint of Software Systems: a Primer [56.200335252600354]
quantifying the energy footprint of a software system is one of the most basic activities.
This document aims to be a starting point for researchers who want to begin conducting work in this area.
arXiv Detail & Related papers (2024-07-16T11:21:30Z) - The Code the World Depends On: A First Look at Technology Makers' Open Source Software Dependencies [3.6840775431698893]
Open-source software (OSS) supply chain security has become a topic of concern for organizations.
Patching an OSS vulnerability can require updating other dependent software products in addition to the original package.
We do not know what packages are most critical to patch, hindering efforts to improve OSS security where it is most needed.
arXiv Detail & Related papers (2024-04-17T21:44:38Z) - SciCat: A Curated Dataset of Scientific Software Repositories [4.77982299447395]
We introduce the SciCat dataset -- a comprehensive collection of Free-Libre Open Source Software (FLOSS) projects.
Our approach involves selecting projects from a pool of 131 million deforked repositories from the World of Code data source.
Our classification focuses on software designed for scientific purposes, research-related projects, and research support software.
arXiv Detail & Related papers (2023-12-11T13:46:33Z) - Using Machine Learning To Identify Software Weaknesses From Software
Requirement Specifications [49.1574468325115]
This research focuses on finding an efficient machine learning algorithm to identify software weaknesses from requirement specifications.
Keywords extracted using latent semantic analysis help map the CWE categories to PROMISE_exp. Naive Bayes, support vector machine (SVM), decision trees, neural network, and convolutional neural network (CNN) algorithms were tested.
arXiv Detail & Related papers (2023-08-10T13:19:10Z) - A Metadata-Based Ecosystem to Improve the FAIRness of Research Software [0.3185506103768896]
The reuse of research software is central to research efficiency and academic exchange.
The DataDesc ecosystem is presented, an approach to describing data models of software interfaces with detailed and machine-actionable metadata.
arXiv Detail & Related papers (2023-06-18T19:01:08Z) - Promises and Perils of Mining Software Package Ecosystem Data [10.787686237395816]
Third-party packages have led to the emergence of large software package ecosystems with a maze of inter-dependencies.
Understanding the infrastructure and dynamics of package ecosystems has given rise to approaches for better code reuse, automated updates, and the avoidance of vulnerabilities.
In this chapter, we review promises and perils of mining the rich data related to software package ecosystems available to software engineering researchers.
arXiv Detail & Related papers (2023-05-29T03:09:48Z) - The Basis of Design Tools for Quantum Computing: Arrays, Decision
Diagrams, Tensor Networks, and ZX-Calculus [55.58528469973086]
Quantum computers promise to efficiently solve important problems classical computers never will.
A fully automated quantum software stack needs to be developed.
This work provides a look "under the hood" of today's tools and showcases how these means are utilized in them, e.g., for simulation, compilation, and verification of quantum circuits.
arXiv Detail & Related papers (2023-01-10T19:00:00Z) - Tangelo: An Open-source Python Package for End-to-end Chemistry
Workflows on Quantum Computers [85.21205677945196]
Tangelo is an open-source Python software package for the development of end-to-end chemistry on quantum computers.
It aims to support the design of successful experiments on quantum hardware, and to facilitate advances in quantum algorithm development.
arXiv Detail & Related papers (2022-06-24T17:44:00Z) - Satellite Image Time Series Analysis for Big Earth Observation Data [50.591267188664666]
This paper describes sits, an open-source R package for satellite image time series analysis using machine learning.
We show that this approach produces high accuracy for land use and land cover maps through a case study in the Cerrado biome.
arXiv Detail & Related papers (2022-04-24T15:23:25Z) - Underproduction: An Approach for Measuring Risk in Open Source Software [9.701036831490766]
'Underproduction' occurs when the supply of software engineering labor becomes out of alignment with the demand of people who rely on the software produced.
We present a conceptual framework for identifying relative underproduction in software as well as a statistical method for applying our framework to a comprehensive dataset.
arXiv Detail & Related papers (2021-02-27T23:18:21Z) - Machine Learning for Software Engineering: A Systematic Mapping [73.30245214374027]
The software development industry is rapidly adopting machine learning for transitioning modern day software systems towards highly intelligent and self-learning systems.
No comprehensive study exists that explores the current state-of-the-art on the adoption of machine learning across software engineering life cycle stages.
This study introduces a machine learning for software engineering (MLSE) taxonomy classifying the state-of-the-art machine learning techniques according to their applicability to various software engineering life cycle stages.
arXiv Detail & Related papers (2020-05-27T11:56:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.