Research Artifacts in Secondary Studies: A Systematic Mapping in Software Engineering
- URL: http://arxiv.org/abs/2504.12646v2
- Date: Wed, 25 Jun 2025 07:53:59 GMT
- Title: Research Artifacts in Secondary Studies: A Systematic Mapping in Software Engineering
- Authors: Aleksi Huotala, Miikka Kuutila, Mika Mäntylä,
- Abstract summary: Systematic reviews (SRs) summarize state-of-the-art evidence in science, including software engineering (SE)<n>We examined 537 secondary studies published between 2013 and 2023 to analyze the availability and reporting of research artifacts.
- Score: 0.9421843976231371
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Context: Systematic reviews (SRs) summarize state-of-the-art evidence in science, including software engineering (SE). Objective: Our objective is to evaluate how SRs report research artifacts and to provide a comprehensive list of these artifacts. Method: We examined 537 secondary studies published between 2013 and 2023 to analyze the availability and reporting of research artifacts. Results: Our findings indicate that only 31.5% of the reviewed studies include research artifacts. Encouragingly, the situation is gradually improving, as our regression analysis shows a significant increase in the availability of research artifacts over time. However, in 2023, just 62.0% of secondary studies provide a research artifact while an even lower percentage, 30.4% use a permanent repository with a digital object identifier (DOI) for storage. Conclusion: To enhance transparency and reproducibility in SE research, we advocate for the mandatory publication of research artifacts in secondary studies.
Related papers
- Tracking research software outputs in the UK [1.1970409518725493]
This study examines where UK academic institutions store and register software as a unique research output.<n>The quantity of software reported as research outcomes remains low in proportion to other categories.<n>Artifact sharing appears low, with one-quarter of the reported software having no links and 45% having either a missing or erroneous URL.
arXiv Detail & Related papers (2025-07-30T17:46:47Z) - Transforming Science with Large Language Models: A Survey on AI-assisted Scientific Discovery, Experimentation, Content Generation, and Evaluation [58.064940977804596]
A plethora of new AI models and tools has been proposed, promising to empower researchers and academics worldwide to conduct their research more effectively and efficiently.<n>Ethical concerns regarding shortcomings of these tools and potential for misuse take a particularly prominent place in our discussion.
arXiv Detail & Related papers (2025-02-07T18:26:45Z) - Generative Retrieval for Book search [106.67655212825025]
We propose an effective Generative retrieval framework for Book Search.<n>It features two main components: data augmentation and outline-oriented book encoding.<n>Experiments on a proprietary Baidu dataset demonstrate that GBS outperforms strong baselines.
arXiv Detail & Related papers (2025-01-19T12:57:13Z) - Revisiting BPR: A Replicability Study of a Common Recommender System Baseline [78.00363373925758]
We study the features of the BPR model, indicating their impact on its performance, and investigate open-source BPR implementations.
Our analysis reveals inconsistencies between these implementations and the original BPR paper, leading to a significant decrease in performance of up to 50% for specific implementations.
We show that the BPR model can achieve performance levels close to state-of-the-art methods on the top-n recommendation tasks and even outperform them on specific datasets.
arXiv Detail & Related papers (2024-09-21T18:39:53Z) - SciRIFF: A Resource to Enhance Language Model Instruction-Following over Scientific Literature [80.49349719239584]
We present SciRIFF (Scientific Resource for Instruction-Following and Finetuning), a dataset of 137K instruction-following demonstrations for 54 tasks.
SciRIFF is the first dataset focused on extracting and synthesizing information from research literature across a wide range of scientific fields.
arXiv Detail & Related papers (2024-06-10T21:22:08Z) - Requirements Quality Research Artifacts: Recovery, Analysis, and Management Guideline [3.91424340393661]
We aim to improve the availability of research artifacts in requirements quality research.
We extend an artifact recovery initiative and empirically evaluate the reasons for artifact unavailability.
We compile a concise guideline for open science artifact disclosure.
arXiv Detail & Related papers (2024-06-03T07:09:15Z) - RaFe: Ranking Feedback Improves Query Rewriting for RAG [83.24385658573198]
We propose a framework for training query rewriting models free of annotations.
By leveraging a publicly available reranker, oursprovides feedback aligned well with the rewriting objectives.
arXiv Detail & Related papers (2024-05-23T11:00:19Z) - A Second Look on BASS -- Boosting Abstractive Summarization with Unified Semantic Graphs -- A Replication Study [2.592470112714595]
We present a detailed replication study of the BASS framework, an abstractive summarization system based on the notion of Unified Semantic Graphs.
Our investigation includes challenges in replicating key components and an ablation study to systematically isolate error sources rooted in replicating novel components.
arXiv Detail & Related papers (2024-03-05T12:48:29Z) - Emerging Results on Automated Support for Searching and Selecting
Evidence for Systematic Literature Review Updates [1.1153433121962064]
We present emerging results on an automated approach to support searching and selecting studies for SLR updates in Software Engineering.
We developed an automated tool prototype to perform the snowballing search technique and support selecting relevant studies for SLR updates using Machine Learning (ML) algorithms.
arXiv Detail & Related papers (2024-02-07T23:39:20Z) - De-identification of clinical free text using natural language
processing: A systematic review of current approaches [48.343430343213896]
Natural language processing has repeatedly demonstrated its feasibility in automating the de-identification process.
Our study aims to provide systematic evidence on how the de-identification of clinical free text has evolved in the last thirteen years.
arXiv Detail & Related papers (2023-11-28T13:20:41Z) - Automatically Finding and Categorizing Replication Studies [0.0]
In many fields of experimental science, papers that failed to replicate continue to be cited as a result of the poor discoverability of replication studies.
As a first step to creating a system that automatically finds replication studies for a given paper, 334 replication studies and 344 replicated studies were collected.
arXiv Detail & Related papers (2023-11-25T15:27:10Z) - Anachronic Tertiary Studies in Software Engineering: An Exploratory
Quaternary Study [39.125366249242646]
This paper presents an analysis of 34 software engineering tertiary studies published between 2009 and 2021.
Results indicate that over 60% of the studies demonstrate varying degrees of anachronism due to the publication of primary and secondary studies.
arXiv Detail & Related papers (2023-11-01T00:54:55Z) - How Many Papers Should You Review? A Research Synthesis of Systematic
Literature Reviews in Software Engineering [5.6292136785289175]
We aim to provide more understanding of when an SLR in Software Engineering should be conducted.
A research synthesis was conducted on a sample of 170 SLRs published in top-tier SE journals.
The results of our study can be used by SE researchers as an indicator or benchmark to understand whether an SLR is conducted at a good time.
arXiv Detail & Related papers (2023-07-12T10:18:58Z) - Artificial Intelligence in Concrete Materials: A Scientometric View [77.34726150561087]
This chapter aims to uncover the main research interests and knowledge structure of the existing literature on AI for concrete materials.
To begin with, a total of 389 journal articles published from 1990 to 2020 were retrieved from the Web of Science.
Scientometric tools such as keyword co-occurrence analysis and documentation co-citation analysis were adopted to quantify features and characteristics of the research field.
arXiv Detail & Related papers (2022-09-17T18:24:56Z) - Research Trends and Applications of Data Augmentation Algorithms [77.34726150561087]
We identify the main areas of application of data augmentation algorithms, the types of algorithms used, significant research trends, their progression over time and research gaps in data augmentation literature.
We expect readers to understand the potential of data augmentation, as well as identify future research directions and open questions within data augmentation research.
arXiv Detail & Related papers (2022-07-18T11:38:32Z) - What do You Mean by Relation Extraction? A Survey on Datasets and Study
on Scientific Relation Classification [21.513743126525622]
We present an empirical study on scientific Relation Classification across two datasets.
Despite large data overlap, our analysis reveals substantial discrepancies in annotation.
Variation within further sub-domains exists but impacts Relation Classification only limited degrees.
arXiv Detail & Related papers (2022-04-28T14:07:25Z) - Does Recommend-Revise Produce Reliable Annotations? An Analysis on
Missing Instances in DocRED [60.39125850987604]
We show that a textit-revise scheme results in false negative samples and an obvious bias towards popular entities and relations.
The relabeled dataset is released to serve as a more reliable test set of document RE models.
arXiv Detail & Related papers (2022-04-17T11:29:01Z) - 3D Object Detection from Images for Autonomous Driving: A Survey [68.33502122185813]
3D object detection from images is one of the fundamental and challenging problems in autonomous driving.
More than 200 works have studied this problem from 2015 to 2021, encompassing a broad spectrum of theories, algorithms, and applications.
We provide the first comprehensive survey of this novel and continuously growing research field, summarizing the most commonly used pipelines for image-based 3D detection.
arXiv Detail & Related papers (2022-02-07T07:12:24Z) - The MultiBERTs: BERT Reproductions for Robustness Analysis [86.29162676103385]
Re-running pretraining can lead to substantially different conclusions about performance.
We introduce MultiBERTs: a set of 25 BERT-base checkpoints.
The aim is to enable researchers to draw robust and statistically justified conclusions about pretraining procedures.
arXiv Detail & Related papers (2021-06-30T15:56:44Z) - An Empirical Analysis of the R Package Ecosystem [0.0]
We analyze more than 25,000 packages, 150,000 releases, and 15 million files across two decades.
We find that the historical growth of the ecosystem has been robust under all measures.
arXiv Detail & Related papers (2021-02-19T12:55:18Z) - Topic Space Trajectories: A case study on machine learning literature [0.0]
We present topic space trajectories, a structure that allows for the comprehensible tracking of research topics.
We show the applicability of our approach on a publication corpus spanning 50 years of machine learning research from 32 publication venues.
Our novel analysis method may be employed for paper classification, for the prediction of future research topics, and for the recommendation of fitting conferences and journals for submitting unpublished work.
arXiv Detail & Related papers (2020-10-23T10:53:42Z) - Identifying Statistical Bias in Dataset Replication [102.92137353938388]
We study a replication of the ImageNet dataset on which models exhibit a significant (11-14%) drop in accuracy.
After correcting for the identified statistical bias, only an estimated $3.6% pm 1.5%$ of the original $11.7% pm 1.0%$ accuracy drop remains unaccounted for.
arXiv Detail & Related papers (2020-05-19T17:48:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.