Related papers: SMECS: A Software Metadata Extraction and Curation Software

SMECS: A Software Metadata Extraction and Curation Software

URL: http://arxiv.org/abs/2507.18159v1
Date: Thu, 24 Jul 2025 07:53:46 GMT
Title: SMECS: A Software Metadata Extraction and Curation Software
Authors: Stephan Ferenz, Aida Jafarbigloo, Oliver Werth, Astrid Nieße,
Abstract summary: Metadata play a crucial role in adopting the FAIR principles for research software and enables findability and reusability.<n>We developed the Software Metadata Extraction and Curation Software (SMECS) which integrates the extraction of metadata from existing sources together with a user-friendly interface for metadata curation.<n> SMECS extracts metadata from online repositories such as GitHub and presents it to researchers through an interactive interface for further curation and export as a CodeMeta file.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Metadata play a crucial role in adopting the FAIR principles for research software and enables findability and reusability. However, creating high-quality metadata can be resource-intensive for researchers and research software engineers. To address this challenge, we developed the Software Metadata Extraction and Curation Software (SMECS) which integrates the extraction of metadata from existing sources together with a user-friendly interface for metadata curation. SMECS extracts metadata from online repositories such as GitHub and presents it to researchers through an interactive interface for further curation and export as a CodeMeta file. The usability of SMECS was evaluated through usability experiments which confirmed that SMECS provides a satisfactory user experience. SMECS supports the FAIRification of research software by simplifying metadata creation.

Related papers

Scaling Generalist Data-Analytic Agents [95.05161133349242]
DataMind is a scalable data synthesis and agent training recipe designed to build generalist data-analytic agents.<n>DataMind tackles three key challenges in building open-source data-analytic agents.
arXiv Detail & Related papers (2025-09-29T17:23:08Z)
Flexible metadata harvesting for ecology using large language models [3.4117490081172774]
We develop a large language model (LLM)-based metadata harvester.<n>It flexibly extracts metadata from any dataset's landing page.<n>It converts these to a user-defined, unified format using existing metadata standards.
arXiv Detail & Related papers (2025-08-21T10:10:29Z)
Identity resolution of software metadata using Large Language Models [0.0]
This article presents an evaluation of instruction-tuned large language models for the task of software metadata identity resolution.<n>We benchmarked multiple models against a human-annotated gold standard, examined their behavior on ambiguous cases, and introduced an agreement-based proxy for high-confidence automated decisions.
arXiv Detail & Related papers (2025-05-29T14:47:31Z)
MOLE: Metadata Extraction and Validation in Scientific Papers Using LLMs [54.5729817345543]
MOLE is a framework that automatically extracts metadata attributes from scientific papers covering datasets of languages other than Arabic.<n>Our methodology processes entire documents across multiple input formats and incorporates robust validation mechanisms for consistent output.
arXiv Detail & Related papers (2025-05-26T10:31:26Z)
Composed Multi-modal Retrieval: A Survey of Approaches and Applications [81.54640206021757]
Composed Multi-modal Retrieval (CMR) emerges as a pivotal next-generation technology.<n>CMR enables users to query images or videos by integrating a reference visual input with textual modifications.<n>This paper provides a comprehensive survey of CMR, covering its fundamental challenges, technical advancements, and applications.
arXiv Detail & Related papers (2025-03-03T09:18:43Z)
Harmonizing Metadata of Language Resources for Enhanced Querying and Accessibility [0.0]
This paper addresses the harmonization of metadata from diverse repositories of language resources (LRs)<n>Our methodology supports text-based search, faceted browsing, and advanced SPARQL queries through Linghub, a newly developed portal.<n>The study highlights significant metadata issues and advocates for adherence to open vocabularies and standards to enhance metadata harmonization.
arXiv Detail & Related papers (2025-01-09T22:48:43Z)
Towards a Classification of Open-Source ML Models and Datasets for Software Engineering [52.257764273141184]
Open-source Pre-Trained Models (PTMs) and datasets provide extensive resources for various Machine Learning (ML) tasks. These resources lack a classification tailored to Software Engineering (SE) needs. We apply an SE-oriented classification to PTMs and datasets on a popular open-source ML repository, Hugging Face (HF), and analyze the evolution of PTMs over time.
arXiv Detail & Related papers (2024-11-14T18:52:05Z)
Extending and Applying Automated HERMES Software Publication Workflows [0.6157382820537718]
HERMES is a tool that automates the publication of software with rich metadata.<n>We show how to use HERMES as an end user, both via the informal command line interface, and as a step in a continuous integration pipeline.
arXiv Detail & Related papers (2024-10-23T07:11:48Z)
DiscoveryBench: Towards Data-Driven Discovery with Large Language Models [50.36636396660163]
We present DiscoveryBench, the first comprehensive benchmark that formalizes the multi-step process of data-driven discovery. Our benchmark contains 264 tasks collected across 6 diverse domains, such as sociology and engineering. Our benchmark, thus, illustrates the challenges in autonomous data-driven discovery and serves as a valuable resource for the community to make progress.
arXiv Detail & Related papers (2024-07-01T18:58:22Z)
A Metadata-Based Ecosystem to Improve the FAIRness of Research Software [0.3185506103768896]
The reuse of research software is central to research efficiency and academic exchange. The DataDesc ecosystem is presented, an approach to describing data models of software interfaces with detailed and machine-actionable metadata.
arXiv Detail & Related papers (2023-06-18T19:01:08Z)
TSGM: A Flexible Framework for Generative Modeling of Synthetic Time Series [61.436361263605114]
Time series data are often scarce or highly sensitive, which precludes the sharing of data between researchers and industrial organizations. We introduce Time Series Generative Modeling (TSGM), an open-source framework for the generative modeling of synthetic time series.
arXiv Detail & Related papers (2023-05-19T10:11:21Z)
Multimodal Recommender Systems: A Survey [50.23505070348051]
Multimodal Recommender System (MRS) has attracted much attention from both academia and industry recently. In this paper, we will give a comprehensive survey of the MRS models, mainly from technical views. To access more details of the surveyed papers, such as implementation code, we open source a repository.
arXiv Detail & Related papers (2023-02-08T05:12:54Z)
FAIRification of MLC data [5.803041363561935]
We introduce an online catalogue of MLC datasets that follow the FAIR (Findable, Accessible, Interoperable, and Reusable) and TRUST (Transparency, Responsibility, User focus, Sustainability, and Technology) principles. The catalogue extensively describes many MLC datasets with comprehensible meta-features, MLC-specific semantic descriptions, and different data provenance information.
arXiv Detail & Related papers (2022-11-23T07:53:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.