Extending and Applying Automated HERMES Software Publication Workflows
- URL: http://arxiv.org/abs/2410.17614v1
- Date: Wed, 23 Oct 2024 07:11:48 GMT
- Title: Extending and Applying Automated HERMES Software Publication Workflows
- Authors: Sophie Kernchen, Michael Meinel, Stephan Druskat, Michael Fritzsche, David Pape, Oliver Bertuch,
- Abstract summary: HERMES is a tool that automates the publication of software with rich metadata.
We show how to use HERMES as an end user, both via the informal command line interface, and as a step in a continuous integration pipeline.
- Score: 0.6157382820537718
- License:
- Abstract: Research software is an import output of research and must be published according to the FAIR Principles for Research Software. This can be achieved by publishing software with metadata under a persistent identifier. HERMES is a tool that leverages continuous integration to automate the publication of software with rich metadata. In this work, we describe the HERMES workflow itself, and how to extend it to meet the needs of specific research software metadata or infrastructure. We introduce the HERMES plugin architecture and provide the example of creating a new HERMES plugin that harvests metadata from a metadata source in source code repositories. We show how to use HERMES as an end user, both via the command line interface, and as a step in a continuous integration pipeline. Finally, we report three informal case studies whose results provide a preliminary evaluation of the feasibility and applicability of HERMES workflows, and the extensibility of the hermes software package.
Related papers
- Towards a Classification of Open-Source ML Models and Datasets for Software Engineering [52.257764273141184]
Open-source Pre-Trained Models (PTMs) and datasets provide extensive resources for various Machine Learning (ML) tasks.
These resources lack a classification tailored to Software Engineering (SE) needs.
We apply an SE-oriented classification to PTMs and datasets on a popular open-source ML repository, Hugging Face (HF), and analyze the evolution of PTMs over time.
arXiv Detail & Related papers (2024-11-14T18:52:05Z) - Towards Realistic Evaluation of Commit Message Generation by Matching Online and Offline Settings [77.20838441870151]
Commit message generation is a crucial task in software engineering that is challenging to evaluate correctly.
We use an online metric - the number of edits users introduce before committing the generated messages to the VCS - to select metrics for offline experiments.
Our results indicate that edit distance exhibits the highest correlation, whereas commonly used similarity metrics such as BLEU and METEOR demonstrate low correlation.
arXiv Detail & Related papers (2024-10-15T20:32:07Z) - Flow with FlorDB: Incremental Context Maintenance for the Machine Learning Lifecycle [9.424552130799661]
We present techniques to harvest and query arbitrary metadata from machine learning pipelines.
We show how hindsight logging allows such statements to be added and executed post-hoc.
This is done in a "metadata later style" off the critical path of agile development.
arXiv Detail & Related papers (2024-08-05T14:21:00Z) - Microsoft Cloud-based Digitization Workflow with Rich Metadata Acquisition for Cultural Heritage Objects [7.450700594277742]
We have developed a new digitization workflow with the Jagiellonian Library (JL)
The solution is based on easy-to-access technological solutions -- Microsoft cloud with MS Excel files interfaces, Office Script for metadata acquisition, MS 365 for storage -- that allows metadata acquisition by domain experts.
The ultimate goal is to create a knowledge graph that describes the analyzed holdings, linked to general knowledge bases, as well as to other cultural heritage collections.
arXiv Detail & Related papers (2024-07-09T15:49:47Z) - DiscoveryBench: Towards Data-Driven Discovery with Large Language Models [50.36636396660163]
We present DiscoveryBench, the first comprehensive benchmark that formalizes the multi-step process of data-driven discovery.
Our benchmark contains 264 tasks collected across 6 diverse domains, such as sociology and engineering.
Our benchmark, thus, illustrates the challenges in autonomous data-driven discovery and serves as a valuable resource for the community to make progress.
arXiv Detail & Related papers (2024-07-01T18:58:22Z) - DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows [72.40917624485822]
We introduce DataDreamer, an open source Python library that allows researchers to implement powerful large language models.
DataDreamer also helps researchers adhere to best practices that we propose to encourage open science.
arXiv Detail & Related papers (2024-02-16T00:10:26Z) - A Metadata-Based Ecosystem to Improve the FAIRness of Research Software [0.3185506103768896]
The reuse of research software is central to research efficiency and academic exchange.
The DataDesc ecosystem is presented, an approach to describing data models of software interfaces with detailed and machine-actionable metadata.
arXiv Detail & Related papers (2023-06-18T19:01:08Z) - The Semantic Scholar Open Data Platform [79.4493235243312]
Semantic Scholar (S2) is an open data platform and website aimed at accelerating science by helping scholars discover and understand scientific literature.
We combine public and proprietary data sources using state-of-the-art techniques for scholarly PDF content extraction and automatic knowledge graph construction.
The graph includes advanced semantic features such as structurally parsed text, natural language summaries, and vector embeddings.
arXiv Detail & Related papers (2023-01-24T17:13:08Z) - MEGAnno: Exploratory Labeling for NLP in Computational Notebooks [9.462926987075122]
We present MEGAnno, a novel annotation framework designed for NLP practitioners and researchers.
With MEGAnno, users can explore data through sophisticated search and interactive suggestion functions.
We demonstrate MEGAnno's flexible, exploratory, efficient, and seamless labeling experience through a sentiment analysis use case.
arXiv Detail & Related papers (2023-01-08T19:16:22Z) - LAME: Layout Aware Metadata Extraction Approach for Research Articles [1.8899300124593648]
The volume of academic literature, such as academic conference papers and journals, has increased rapidly worldwide.
High-performing metadata extraction is still challenging due to diverse layout formats according to journal publishers.
We propose a novel LAyout-aware Metadata Extraction framework equipped with the three characteristics.
arXiv Detail & Related papers (2021-12-23T04:23:08Z) - SacreROUGE: An Open-Source Library for Using and Developing
Summarization Evaluation Metrics [74.28810048824519]
SacreROUGE is an open-source library for using and developing summarization evaluation metrics.
The library provides Python wrappers around the official implementations of existing evaluation metrics.
It provides functionality to evaluate how well any metric implemented in the library correlates to human-annotated judgments.
arXiv Detail & Related papers (2020-07-10T13:26:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.