Related papers: A Metadata-Based Ecosystem to Improve the FAIRness of Research Software

Related papers

Understanding Usage and Engagement in AI-Powered Scientific Research Tools: The Asta Interaction Dataset [47.98539809308384]
We analyze the Asta Interaction dataset, a large-scale resource comprising over 200,000 user queries and interaction logs.<n>We characterize query patterns, engagement behaviors, and how usage evolves with experience.<n>We release the anonymized dataset and analysis with a new query taxonomy to inform future designs of real-world AI research assistants.
arXiv Detail & Related papers (2026-02-26T18:40:28Z)
The Reproducible Research Platform establishes a unified open science environment bridging data and software lifecycles across disciplines, from proposal to publication [0.0]
We developed the open-source Reproducible Research Platform (RRP), which unifies research data management with version-controlled, containerized computational environments.<n>RRP enables anyone to execute, reuse and publish fully documented, FAIR research without manual retrieval or platform-specific setup.<n>We demonstrate RRP's impact by reproducing results from diverse published studies, including work over a decade old, showing sustained usability.
arXiv Detail & Related papers (2025-12-04T22:02:19Z)
OpenDORS: A dataset of openly referenced open research software [1.0026496861838448]
We present a dataset of 134,352 unique open research software projects and 134,154 source code repositories referenced in open access literature.<n>Each dataset record identifies the referencing publication and lists source code repositories of the software project.<n>For 122,425 source code repositories, the dataset provides metadata on latest versions, license information, programming languages and descriptive metadata files.
arXiv Detail & Related papers (2025-12-01T11:45:50Z)
The Software Observatory: aggregating and analysing software metadata for trend computation and FAIR assessment [0.0]
The Software Observatory at OpenEBench is a novel web portal that consolidates software metadata from various sources.<n>Our platform enables users to analyse trends, identify patterns and advancements within the Life Sciences research software ecosystem.
arXiv Detail & Related papers (2025-10-07T09:15:02Z)
WebResearcher: Unleashing unbounded reasoning capability in Long-Horizon Agents [72.28593628378991]
WebResearcher is an iterative deep-research paradigm that reformulates deep research as a Markov Decision Process.<n>WebResearcher achieves state-of-the-art performance, even surpassing frontier proprietary systems.
arXiv Detail & Related papers (2025-09-16T17:57:17Z)
Interoperable verification and dissemination of software assets in repositories using COAR Notify [0.7703881819415161]
SoFAIR (2024-2025) introduces a comprehensive workflow leveraging machine learning tools for extracting software mentions from research papers.<n>The project integrates repository systems, authors, and services like HAL and Software Heritage to ensure proper archiving, citation, and accessibility of research software.<n>This paper outlines the SoFAIR workflow and the implementation of the COAR Notify Protocol.
arXiv Detail & Related papers (2025-08-04T12:13:26Z)
Ten Essential Guidelines for Building High-Quality Research Software [0.3562485774739681]
This paper presents ten guidelines for producing high-quality research software.<n>The guidelines cover every stage of the development lifecycle.<n>They emphasize the importance of planning, writing clean and readable code, using version control, and implementing testing strategies.
arXiv Detail & Related papers (2025-07-22T02:22:41Z)
From Web Search towards Agentic Deep Research: Incentivizing Search with Reasoning Agents [96.65646344634524]
Large Language Models (LLMs), endowed with reasoning and agentic capabilities, are ushering in a new paradigm termed Agentic Deep Research.<n>We trace the evolution from static web search to interactive, agent-based systems that plan, explore, and learn.<n>We demonstrate that Agentic Deep Research not only significantly outperforms existing approaches, but is also poised to become the dominant paradigm for future information seeking.
arXiv Detail & Related papers (2025-06-23T17:27:19Z)
Identity resolution of software metadata using Large Language Models [0.0]
This article presents an evaluation of instruction-tuned large language models for the task of software metadata identity resolution.<n>We benchmarked multiple models against a human-annotated gold standard, examined their behavior on ambiguous cases, and introduced an agreement-based proxy for high-confidence automated decisions.
arXiv Detail & Related papers (2025-05-29T14:47:31Z)
WebThinker: Empowering Large Reasoning Models with Deep Research Capability [60.81964498221952]
WebThinker is a deep research agent that empowers large reasoning models to autonomously search the web, navigate web pages, and draft research reports during the reasoning process.<n>It also employs an textbfAutonomous Think-Search-and-Draft strategy, allowing the model to seamlessly interleave reasoning, information gathering, and report writing in real time.<n>Our approach enhances LRM reliability and applicability in complex scenarios, paving the way for more capable and versatile deep research systems.
arXiv Detail & Related papers (2025-04-30T16:25:25Z)
A Comprehensive Survey on Imbalanced Data Learning [56.65067795190842]
imbalanced data is prevalent in various types of raw data and hinders the performance of machine learning.<n>This survey systematically analyzes various real-world data formats.<n>It concludes existing researches for different data formats into four categories: data re-balancing, feature representation, training strategy, and ensemble learning.
arXiv Detail & Related papers (2025-02-13T04:53:17Z)
MASSW: A New Dataset and Benchmark Tasks for AI-Assisted Scientific Workflows [58.56005277371235]
We introduce MASSW, a comprehensive text dataset on Multi-Aspect Summarization of ScientificAspects. MASSW includes more than 152,000 peer-reviewed publications from 17 leading computer science conferences spanning the past 50 years. We demonstrate the utility of MASSW through multiple novel machine-learning tasks that can be benchmarked using this new dataset.
arXiv Detail & Related papers (2024-06-10T15:19:09Z)
FAIRSECO: An Extensible Framework for Impact Measurement of Research Software [1.549241498953151]
Existing methods for crediting research software and Research Software Engineers have proven to be insufficient. We have developed FAIRSECO, an open source framework with the objective of assessing the impact of research software in research through the evaluation of various factors.
arXiv Detail & Related papers (2024-06-04T15:22:48Z)
Automated Extraction and Maturity Analysis of Open Source Clinical Informatics Repositories from Scientific Literature [0.0]
This study introduces an automated methodology to bridge the gap by systematically extracting GitHub repository URLs from academic papers indexed in arXiv. Our approach encompasses querying the arXiv API for relevant papers, cleaning extracted GitHub URLs, fetching comprehensive repository information via the GitHub API, and analyzing repository maturity based on defined metrics such as stars, forks, open issues, and contributors.
arXiv Detail & Related papers (2024-03-20T17:06:51Z)
NLP-based Relation Extraction Methods in RE [4.856095570023289]
Mobile app repositories have been largely used in scientific research as large-scale, highly adaptive crowdsourced information systems. We present MApp-KG, a combination of software resources and data artefacts to support extended knowledge generation tasks. Our contribution aims to provide a framework for automatically constructing a knowledge graph modelling a domain-specific catalogue of mobile apps.
arXiv Detail & Related papers (2024-01-22T16:14:27Z)
Charting a Path to Efficient Onboarding: The Role of Software Visualization [49.1574468325115]
The present study aims to explore the familiarity of managers, leaders, and developers with software visualization tools. This approach incorporated quantitative and qualitative analyses of data collected from practitioners using questionnaires and semi-structured interviews.
arXiv Detail & Related papers (2024-01-17T21:30:45Z)
SciCat: A Curated Dataset of Scientific Software Repositories [4.77982299447395]
We introduce the SciCat dataset -- a comprehensive collection of Free-Libre Open Source Software (FLOSS) projects. Our approach involves selecting projects from a pool of 131 million deforked repositories from the World of Code data source. Our classification focuses on software designed for scientific purposes, research-related projects, and research support software.
arXiv Detail & Related papers (2023-12-11T13:46:33Z)
A pragmatic workflow for research software engineering in computational science [0.0]
University research groups in Computational Science and Engineering (CSE) generally lack dedicated funding and personnel for Research Software Engineering (RSE) RSE shifts the focus away from sustainable research software development and reproducible results. We propose a RSE workflow for CSE that addresses these challenges, that improves the quality of research output in CSE.
arXiv Detail & Related papers (2023-10-02T08:04:12Z)
Using Machine Learning To Identify Software Weaknesses From Software Requirement Specifications [49.1574468325115]
This research focuses on finding an efficient machine learning algorithm to identify software weaknesses from requirement specifications. Keywords extracted using latent semantic analysis help map the CWE categories to PROMISE_exp. Naive Bayes, support vector machine (SVM), decision trees, neural network, and convolutional neural network (CNN) algorithms were tested.
arXiv Detail & Related papers (2023-08-10T13:19:10Z)
Research Trends and Applications of Data Augmentation Algorithms [77.34726150561087]
We identify the main areas of application of data augmentation algorithms, the types of algorithms used, significant research trends, their progression over time and research gaps in data augmentation literature. We expect readers to understand the potential of data augmentation, as well as identify future research directions and open questions within data augmentation research.
arXiv Detail & Related papers (2022-07-18T11:38:32Z)
DeepShovel: An Online Collaborative Platform for Data Extraction in Geoscience Literature with AI Assistance [48.55345030503826]
Geoscientists need to read a huge amount of literature to locate, extract, and aggregate relevant results and data. DeepShovel is a publicly-available AI-assisted data extraction system to support their needs. A follow-up user evaluation with 14 researchers suggested DeepShovel improved users' efficiency of data extraction for building scientific databases.
arXiv Detail & Related papers (2022-02-21T12:18:08Z)
Software must be recognised as an important output of scholarly research [7.776162183510522]
We argue that as well as being important from a methodological perspective, software should be recognised as an output of research. The article discusses the different roles that software may play in research and highlights the relationship between software and research sustainability.
arXiv Detail & Related papers (2020-11-15T16:34:31Z)
Machine Learning for Software Engineering: A Systematic Mapping [73.30245214374027]
The software development industry is rapidly adopting machine learning for transitioning modern day software systems towards highly intelligent and self-learning systems. No comprehensive study exists that explores the current state-of-the-art on the adoption of machine learning across software engineering life cycle stages. This study introduces a machine learning for software engineering (MLSE) taxonomy classifying the state-of-the-art machine learning techniques according to their applicability to various software engineering life cycle stages.
arXiv Detail & Related papers (2020-05-27T11:56:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.