Extending and Applying Automated HERMES Software Publication Workflows
- URL: http://arxiv.org/abs/2410.17614v2
- Date: Fri, 07 Feb 2025 12:13:31 GMT
- Title: Extending and Applying Automated HERMES Software Publication Workflows
- Authors: Sophie Kernchen, Michael Meinel, Stephan Druskat, Michael Fritzsche, David Pape, Oliver Bertuch,
- Abstract summary: HERMES is a tool that automates the publication of software with rich metadata.
We show how to use HERMES as an end user, both via the informal command line interface, and as a step in a continuous integration pipeline.
- Score: 0.6157382820537718
- License:
- Abstract: Research software is an important output of research and must be published according to the FAIR Principles for Research Software. This can be achieved by publishing software with metadata under a persistent identifier. HERMES is a tool that leverages continuous integration to automate the publication of software with rich metadata. In this work, we describe the HERMES workflow itself, and how to extend it to meet the needs of specific research software metadata or infrastructure. We introduce the HERMES plugin architecture and provide the example of creating a new HERMES plugin that harvests metadata from a metadata source in source code repositories. We show how to use HERMES as an end user, both via the command line interface, and as a step in a continuous integration pipeline. Finally, we report three informal case studies whose results provide a preliminary evaluation of the feasibility and applicability of HERMES workflows, and the extensibility of the hermes software package.
Related papers
- Making Software FAIR: A machine-assisted workflow for the research software lifecycle [2.682583873311538]
SoFAIR will extend the capabilities of widely used open scholarly infrastructures.
It will deliver and deploy an effective solution for the management of the research software lifecycle.
arXiv Detail & Related papers (2025-01-08T14:17:26Z) - GeAR: Generation Augmented Retrieval [82.20696567697016]
Document retrieval techniques form the foundation for the development of large-scale information systems.
The prevailing methodology is to construct a bi-encoder and compute the semantic similarity.
We propose a new method called $textbfGe$neration that incorporates well-designed fusion and decoding modules.
arXiv Detail & Related papers (2025-01-06T05:29:00Z) - Towards a Classification of Open-Source ML Models and Datasets for Software Engineering [52.257764273141184]
Open-source Pre-Trained Models (PTMs) and datasets provide extensive resources for various Machine Learning (ML) tasks.
These resources lack a classification tailored to Software Engineering (SE) needs.
We apply an SE-oriented classification to PTMs and datasets on a popular open-source ML repository, Hugging Face (HF), and analyze the evolution of PTMs over time.
arXiv Detail & Related papers (2024-11-14T18:52:05Z) - Masked Image Modeling: A Survey [73.21154550957898]
Masked image modeling emerged as a powerful self-supervised learning technique in computer vision.
We construct a taxonomy and review the most prominent papers in recent years.
We aggregate the performance results of various masked image modeling methods on the most popular datasets.
arXiv Detail & Related papers (2024-08-13T07:27:02Z) - Microsoft Cloud-based Digitization Workflow with Rich Metadata Acquisition for Cultural Heritage Objects [7.450700594277742]
We have developed a new digitization workflow with the Jagiellonian Library (JL)
The solution is based on easy-to-access technological solutions -- Microsoft cloud with MS Excel files interfaces, Office Script for metadata acquisition, MS 365 for storage -- that allows metadata acquisition by domain experts.
The ultimate goal is to create a knowledge graph that describes the analyzed holdings, linked to general knowledge bases, as well as to other cultural heritage collections.
arXiv Detail & Related papers (2024-07-09T15:49:47Z) - DiscoveryBench: Towards Data-Driven Discovery with Large Language Models [50.36636396660163]
We present DiscoveryBench, the first comprehensive benchmark that formalizes the multi-step process of data-driven discovery.
Our benchmark contains 264 tasks collected across 6 diverse domains, such as sociology and engineering.
Our benchmark, thus, illustrates the challenges in autonomous data-driven discovery and serves as a valuable resource for the community to make progress.
arXiv Detail & Related papers (2024-07-01T18:58:22Z) - DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows [72.40917624485822]
We introduce DataDreamer, an open source Python library that allows researchers to implement powerful large language models.
DataDreamer also helps researchers adhere to best practices that we propose to encourage open science.
arXiv Detail & Related papers (2024-02-16T00:10:26Z) - A Metadata-Based Ecosystem to Improve the FAIRness of Research Software [0.3185506103768896]
The reuse of research software is central to research efficiency and academic exchange.
The DataDesc ecosystem is presented, an approach to describing data models of software interfaces with detailed and machine-actionable metadata.
arXiv Detail & Related papers (2023-06-18T19:01:08Z) - The Semantic Scholar Open Data Platform [79.4493235243312]
Semantic Scholar (S2) is an open data platform and website aimed at accelerating science by helping scholars discover and understand scientific literature.
We combine public and proprietary data sources using state-of-the-art techniques for scholarly PDF content extraction and automatic knowledge graph construction.
The graph includes advanced semantic features such as structurally parsed text, natural language summaries, and vector embeddings.
arXiv Detail & Related papers (2023-01-24T17:13:08Z) - MEGAnno: Exploratory Labeling for NLP in Computational Notebooks [9.462926987075122]
We present MEGAnno, a novel annotation framework designed for NLP practitioners and researchers.
With MEGAnno, users can explore data through sophisticated search and interactive suggestion functions.
We demonstrate MEGAnno's flexible, exploratory, efficient, and seamless labeling experience through a sentiment analysis use case.
arXiv Detail & Related papers (2023-01-08T19:16:22Z) - LAME: Layout Aware Metadata Extraction Approach for Research Articles [1.8899300124593648]
The volume of academic literature, such as academic conference papers and journals, has increased rapidly worldwide.
High-performing metadata extraction is still challenging due to diverse layout formats according to journal publishers.
We propose a novel LAyout-aware Metadata Extraction framework equipped with the three characteristics.
arXiv Detail & Related papers (2021-12-23T04:23:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.