Related papers: Extending and Applying Automated HERMES Software Publication Workflows

Related papers

Interoperable verification and dissemination of software assets in repositories using COAR Notify [0.7703881819415161]
SoFAIR (2024-2025) introduces a comprehensive workflow leveraging machine learning tools for extracting software mentions from research papers.<n>The project integrates repository systems, authors, and services like HAL and Software Heritage to ensure proper archiving, citation, and accessibility of research software.<n>This paper outlines the SoFAIR workflow and the implementation of the COAR Notify Protocol.
arXiv Detail & Related papers (2025-08-04T12:13:26Z)
SMECS: A Software Metadata Extraction and Curation Software [0.0]
Metadata play a crucial role in adopting the FAIR principles for research software and enables findability and reusability.<n>We developed the Software Metadata Extraction and Curation Software (SMECS) which integrates the extraction of metadata from existing sources together with a user-friendly interface for metadata curation.<n> SMECS extracts metadata from online repositories such as GitHub and presents it to researchers through an interactive interface for further curation and export as a CodeMeta file.
arXiv Detail & Related papers (2025-07-24T07:53:46Z)
MOLE: Metadata Extraction and Validation in Scientific Papers Using LLMs [54.5729817345543]
MOLE is a framework that automatically extracts metadata attributes from scientific papers covering datasets of languages other than Arabic.<n>Our methodology processes entire documents across multiple input formats and incorporates robust validation mechanisms for consistent output.
arXiv Detail & Related papers (2025-05-26T10:31:26Z)
Chatting with Papers: A Hybrid Approach Using LLMs and Knowledge Graphs [3.68389405018277]
This demo paper reports on a new workflow textitGhostWriter that combines the use of Large Language Models and Knowledge Graphs to support navigation through collections.<n>Based on the tool-suite textitEverythingData at the backend, textitGhostWriter provides an interface that enables querying and chatting'' with a collection.
arXiv Detail & Related papers (2025-05-16T18:51:51Z)
SweRank: Software Issue Localization with Code Ranking [109.3289316191729]
SweRank is an efficient retrieve-and-rerank framework for software issue localization.<n>We construct SweLoc, a large-scale dataset curated from public GitHub repositories.<n>We show that SweRank achieves state-of-the-art performance, outperforming both prior ranking models and costly agent-based systems.
arXiv Detail & Related papers (2025-05-07T19:44:09Z)
Making Software FAIR: A machine-assisted workflow for the research software lifecycle [2.682583873311538]
SoFAIR will extend the capabilities of widely used open scholarly infrastructures. It will deliver and deploy an effective solution for the management of the research software lifecycle.
arXiv Detail & Related papers (2025-01-08T14:17:26Z)
GeAR: Generation Augmented Retrieval [82.20696567697016]
Document retrieval techniques form the foundation for the development of large-scale information systems. The prevailing methodology is to construct a bi-encoder and compute the semantic similarity. We propose a new method called $textbfGe$neration that incorporates well-designed fusion and decoding modules.
arXiv Detail & Related papers (2025-01-06T05:29:00Z)
Towards a Classification of Open-Source ML Models and Datasets for Software Engineering [52.257764273141184]
Open-source Pre-Trained Models (PTMs) and datasets provide extensive resources for various Machine Learning (ML) tasks. These resources lack a classification tailored to Software Engineering (SE) needs. We apply an SE-oriented classification to PTMs and datasets on a popular open-source ML repository, Hugging Face (HF), and analyze the evolution of PTMs over time.
arXiv Detail & Related papers (2024-11-14T18:52:05Z)
Towards Realistic Evaluation of Commit Message Generation by Matching Online and Offline Settings [77.20838441870151]
Commit message generation is a crucial task in software engineering that is challenging to evaluate correctly. We use an online metric - the number of edits users introduce before committing the generated messages to the VCS - to select metrics for offline experiments. Our results indicate that edit distance exhibits the highest correlation, whereas commonly used similarity metrics such as BLEU and METEOR demonstrate low correlation.
arXiv Detail & Related papers (2024-10-15T20:32:07Z)
Masked Image Modeling: A Survey [73.21154550957898]
Masked image modeling emerged as a powerful self-supervised learning technique in computer vision. We construct a taxonomy and review the most prominent papers in recent years. We aggregate the performance results of various masked image modeling methods on the most popular datasets.
arXiv Detail & Related papers (2024-08-13T07:27:02Z)
Flow with FlorDB: Incremental Context Maintenance for the Machine Learning Lifecycle [9.424552130799661]
We present techniques to harvest and query arbitrary metadata from machine learning pipelines. We show how hindsight logging allows such statements to be added and executed post-hoc. This is done in a "metadata later style" off the critical path of agile development.
arXiv Detail & Related papers (2024-08-05T14:21:00Z)
Microsoft Cloud-based Digitization Workflow with Rich Metadata Acquisition for Cultural Heritage Objects [7.450700594277742]
We have developed a new digitization workflow with the Jagiellonian Library (JL) The solution is based on easy-to-access technological solutions -- Microsoft cloud with MS Excel files interfaces, Office Script for metadata acquisition, MS 365 for storage -- that allows metadata acquisition by domain experts. The ultimate goal is to create a knowledge graph that describes the analyzed holdings, linked to general knowledge bases, as well as to other cultural heritage collections.
arXiv Detail & Related papers (2024-07-09T15:49:47Z)
DiscoveryBench: Towards Data-Driven Discovery with Large Language Models [50.36636396660163]
We present DiscoveryBench, the first comprehensive benchmark that formalizes the multi-step process of data-driven discovery. Our benchmark contains 264 tasks collected across 6 diverse domains, such as sociology and engineering. Our benchmark, thus, illustrates the challenges in autonomous data-driven discovery and serves as a valuable resource for the community to make progress.
arXiv Detail & Related papers (2024-07-01T18:58:22Z)
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows [72.40917624485822]
We introduce DataDreamer, an open source Python library that allows researchers to implement powerful large language models. DataDreamer also helps researchers adhere to best practices that we propose to encourage open science.
arXiv Detail & Related papers (2024-02-16T00:10:26Z)
A Metadata-Based Ecosystem to Improve the FAIRness of Research Software [0.3185506103768896]
The reuse of research software is central to research efficiency and academic exchange. The DataDesc ecosystem is presented, an approach to describing data models of software interfaces with detailed and machine-actionable metadata.
arXiv Detail & Related papers (2023-06-18T19:01:08Z)
The Semantic Scholar Open Data Platform [79.4493235243312]
Semantic Scholar (S2) is an open data platform and website aimed at accelerating science by helping scholars discover and understand scientific literature. We combine public and proprietary data sources using state-of-the-art techniques for scholarly PDF content extraction and automatic knowledge graph construction. The graph includes advanced semantic features such as structurally parsed text, natural language summaries, and vector embeddings.
arXiv Detail & Related papers (2023-01-24T17:13:08Z)
MEGAnno: Exploratory Labeling for NLP in Computational Notebooks [9.462926987075122]
We present MEGAnno, a novel annotation framework designed for NLP practitioners and researchers. With MEGAnno, users can explore data through sophisticated search and interactive suggestion functions. We demonstrate MEGAnno's flexible, exploratory, efficient, and seamless labeling experience through a sentiment analysis use case.
arXiv Detail & Related papers (2023-01-08T19:16:22Z)
LAME: Layout Aware Metadata Extraction Approach for Research Articles [1.8899300124593648]
The volume of academic literature, such as academic conference papers and journals, has increased rapidly worldwide. High-performing metadata extraction is still challenging due to diverse layout formats according to journal publishers. We propose a novel LAyout-aware Metadata Extraction framework equipped with the three characteristics.
arXiv Detail & Related papers (2021-12-23T04:23:08Z)
SacreROUGE: An Open-Source Library for Using and Developing Summarization Evaluation Metrics [74.28810048824519]
SacreROUGE is an open-source library for using and developing summarization evaluation metrics. The library provides Python wrappers around the official implementations of existing evaluation metrics. It provides functionality to evaluate how well any metric implemented in the library correlates to human-annotated judgments.
arXiv Detail & Related papers (2020-07-10T13:26:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.