A Metadata-Based Ecosystem to Improve the FAIRness of Research Software
- URL: http://arxiv.org/abs/2306.10620v1
- Date: Sun, 18 Jun 2023 19:01:08 GMT
- Title: A Metadata-Based Ecosystem to Improve the FAIRness of Research Software
- Authors: Patrick Kuckertz, Jan G\"opfert, Oliver Karras, David Neuroth, Julian
Sch\"onau, Rodrigo Pueblas, Stephan Ferenz, Felix Engel, Noah Pflugradt, Jann
M. Weinand, Astrid Nie{\ss}e, S\"oren Auer, Detlef Stolten
- Abstract summary: The reuse of research software is central to research efficiency and academic exchange.
The DataDesc ecosystem is presented, an approach to describing data models of software interfaces with detailed and machine-actionable metadata.
- Score: 0.3185506103768896
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The reuse of research software is central to research efficiency and academic
exchange. The application of software enables researchers with varied
backgrounds to reproduce, validate, and expand upon study findings.
Furthermore, the analysis of open source code aids in the comprehension,
comparison, and integration of approaches. Often, however, no further use
occurs because relevant software cannot be found or is incompatible with
existing research processes. This results in repetitive software development,
which impedes the advancement of individual researchers and entire research
communities. In this article, the DataDesc ecosystem is presented, an approach
to describing data models of software interfaces with detailed and
machine-actionable metadata. In addition to a specialized metadata schema, an
exchange format and support tools for easy collection and the automated
publishing of software documentation are introduced. This approach practically
increases the FAIRness, i.e., findability, accessibility, interoperability, and
so the reusability of research software, as well as effectively promotes its
impact on research.
Related papers
- MASSW: A New Dataset and Benchmark Tasks for AI-Assisted Scientific Workflows [58.56005277371235]
We introduce MASSW, a comprehensive text dataset on Multi-Aspect Summarization of ScientificAspects.
MASSW includes more than 152,000 peer-reviewed publications from 17 leading computer science conferences spanning the past 50 years.
We demonstrate the utility of MASSW through multiple novel machine-learning tasks that can be benchmarked using this new dataset.
arXiv Detail & Related papers (2024-06-10T15:19:09Z) - Automated Extraction and Maturity Analysis of Open Source Clinical Informatics Repositories from Scientific Literature [0.0]
This study introduces an automated methodology to bridge the gap by systematically extracting GitHub repository URLs from academic papers indexed in arXiv.
Our approach encompasses querying the arXiv API for relevant papers, cleaning extracted GitHub URLs, fetching comprehensive repository information via the GitHub API, and analyzing repository maturity based on defined metrics such as stars, forks, open issues, and contributors.
arXiv Detail & Related papers (2024-03-20T17:06:51Z) - NLP-based Relation Extraction Methods in RE [4.856095570023289]
Mobile app repositories have been largely used in scientific research as large-scale, highly adaptive crowdsourced information systems.
We present MApp-KG, a combination of software resources and data artefacts to support extended knowledge generation tasks.
Our contribution aims to provide a framework for automatically constructing a knowledge graph modelling a domain-specific catalogue of mobile apps.
arXiv Detail & Related papers (2024-01-22T16:14:27Z) - Charting a Path to Efficient Onboarding: The Role of Software
Visualization [49.1574468325115]
The present study aims to explore the familiarity of managers, leaders, and developers with software visualization tools.
This approach incorporated quantitative and qualitative analyses of data collected from practitioners using questionnaires and semi-structured interviews.
arXiv Detail & Related papers (2024-01-17T21:30:45Z) - SciCat: A Curated Dataset of Scientific Software Repositories [4.77982299447395]
We introduce the SciCat dataset -- a comprehensive collection of Free-Libre Open Source Software (FLOSS) projects.
Our approach involves selecting projects from a pool of 131 million deforked repositories from the World of Code data source.
Our classification focuses on software designed for scientific purposes, research-related projects, and research support software.
arXiv Detail & Related papers (2023-12-11T13:46:33Z) - A pragmatic workflow for research software engineering in computational
science [0.0]
University research groups in Computational Science and Engineering (CSE) generally lack dedicated funding and personnel for Research Software Engineering (RSE)
RSE shifts the focus away from sustainable research software development and reproducible results.
We propose a RSE workflow for CSE that addresses these challenges, that improves the quality of research output in CSE.
arXiv Detail & Related papers (2023-10-02T08:04:12Z) - Using Machine Learning To Identify Software Weaknesses From Software
Requirement Specifications [49.1574468325115]
This research focuses on finding an efficient machine learning algorithm to identify software weaknesses from requirement specifications.
Keywords extracted using latent semantic analysis help map the CWE categories to PROMISE_exp. Naive Bayes, support vector machine (SVM), decision trees, neural network, and convolutional neural network (CNN) algorithms were tested.
arXiv Detail & Related papers (2023-08-10T13:19:10Z) - Research Trends and Applications of Data Augmentation Algorithms [77.34726150561087]
We identify the main areas of application of data augmentation algorithms, the types of algorithms used, significant research trends, their progression over time and research gaps in data augmentation literature.
We expect readers to understand the potential of data augmentation, as well as identify future research directions and open questions within data augmentation research.
arXiv Detail & Related papers (2022-07-18T11:38:32Z) - DeepShovel: An Online Collaborative Platform for Data Extraction in
Geoscience Literature with AI Assistance [48.55345030503826]
Geoscientists need to read a huge amount of literature to locate, extract, and aggregate relevant results and data.
DeepShovel is a publicly-available AI-assisted data extraction system to support their needs.
A follow-up user evaluation with 14 researchers suggested DeepShovel improved users' efficiency of data extraction for building scientific databases.
arXiv Detail & Related papers (2022-02-21T12:18:08Z) - Software must be recognised as an important output of scholarly research [7.776162183510522]
We argue that as well as being important from a methodological perspective, software should be recognised as an output of research.
The article discusses the different roles that software may play in research and highlights the relationship between software and research sustainability.
arXiv Detail & Related papers (2020-11-15T16:34:31Z) - Machine Learning for Software Engineering: A Systematic Mapping [73.30245214374027]
The software development industry is rapidly adopting machine learning for transitioning modern day software systems towards highly intelligent and self-learning systems.
No comprehensive study exists that explores the current state-of-the-art on the adoption of machine learning across software engineering life cycle stages.
This study introduces a machine learning for software engineering (MLSE) taxonomy classifying the state-of-the-art machine learning techniques according to their applicability to various software engineering life cycle stages.
arXiv Detail & Related papers (2020-05-27T11:56:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.