Replication Packages in Software Engineering Secondary Studies: A Systematic Mapping
- URL: http://arxiv.org/abs/2504.12646v1
- Date: Thu, 17 Apr 2025 05:11:39 GMT
- Title: Replication Packages in Software Engineering Secondary Studies: A Systematic Mapping
- Authors: Aleksi Huotala, Miikka Kuutila, Mika Mäntylä,
- Abstract summary: Systematic reviews (SRs) summarize state-of-the-art evidence in science, including software engineering (SE)<n>We examined 528 secondary studies published between 2013 and 2023 to analyze the availability and reporting of replication packages.
- Score: 0.9421843976231371
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Context: Systematic reviews (SRs) summarize state-of-the-art evidence in science, including software engineering (SE). Objective: Our objective is to evaluate how SRs report replication packages and to provide a comprehensive list of these packages. Method: We examined 528 secondary studies published between 2013 and 2023 to analyze the availability and reporting of replication packages. Results: Our findings indicate that only 25.4% of the reviewed studies include replication packages. Encouragingly, the situation is gradually improving, as our regression analysis shows significant increase in the availability of replication packages over time. However, in 2023, just 50.6% of secondary studies provided a replication package while an even lower percentage, 29.1% had used a permanent repository with a digital object identifier (DOI) for storage. Conclusion: To enhance transparency and reproducibility in SE research, we advocate for the mandatory publication of replication packages in secondary studies.
Related papers
- Generative Retrieval for Book search [106.67655212825025]
We propose an effective Generative retrieval framework for Book Search.<n>It features two main components: data augmentation and outline-oriented book encoding.<n>Experiments on a proprietary Baidu dataset demonstrate that GBS outperforms strong baselines.
arXiv Detail & Related papers (2025-01-19T12:57:13Z) - Revisiting BPR: A Replicability Study of a Common Recommender System Baseline [78.00363373925758]
We study the features of the BPR model, indicating their impact on its performance, and investigate open-source BPR implementations.
Our analysis reveals inconsistencies between these implementations and the original BPR paper, leading to a significant decrease in performance of up to 50% for specific implementations.
We show that the BPR model can achieve performance levels close to state-of-the-art methods on the top-n recommendation tasks and even outperform them on specific datasets.
arXiv Detail & Related papers (2024-09-21T18:39:53Z) - RaFe: Ranking Feedback Improves Query Rewriting for RAG [83.24385658573198]
We propose a framework for training query rewriting models free of annotations.
By leveraging a publicly available reranker, oursprovides feedback aligned well with the rewriting objectives.
arXiv Detail & Related papers (2024-05-23T11:00:19Z) - A Second Look on BASS -- Boosting Abstractive Summarization with Unified Semantic Graphs -- A Replication Study [2.592470112714595]
We present a detailed replication study of the BASS framework, an abstractive summarization system based on the notion of Unified Semantic Graphs.
Our investigation includes challenges in replicating key components and an ablation study to systematically isolate error sources rooted in replicating novel components.
arXiv Detail & Related papers (2024-03-05T12:48:29Z) - Emerging Results on Automated Support for Searching and Selecting
Evidence for Systematic Literature Review Updates [1.1153433121962064]
We present emerging results on an automated approach to support searching and selecting studies for SLR updates in Software Engineering.
We developed an automated tool prototype to perform the snowballing search technique and support selecting relevant studies for SLR updates using Machine Learning (ML) algorithms.
arXiv Detail & Related papers (2024-02-07T23:39:20Z) - Automatically Finding and Categorizing Replication Studies [0.0]
In many fields of experimental science, papers that failed to replicate continue to be cited as a result of the poor discoverability of replication studies.
As a first step to creating a system that automatically finds replication studies for a given paper, 334 replication studies and 344 replicated studies were collected.
arXiv Detail & Related papers (2023-11-25T15:27:10Z) - What do You Mean by Relation Extraction? A Survey on Datasets and Study
on Scientific Relation Classification [21.513743126525622]
We present an empirical study on scientific Relation Classification across two datasets.
Despite large data overlap, our analysis reveals substantial discrepancies in annotation.
Variation within further sub-domains exists but impacts Relation Classification only limited degrees.
arXiv Detail & Related papers (2022-04-28T14:07:25Z) - Does Recommend-Revise Produce Reliable Annotations? An Analysis on
Missing Instances in DocRED [60.39125850987604]
We show that a textit-revise scheme results in false negative samples and an obvious bias towards popular entities and relations.
The relabeled dataset is released to serve as a more reliable test set of document RE models.
arXiv Detail & Related papers (2022-04-17T11:29:01Z) - The MultiBERTs: BERT Reproductions for Robustness Analysis [86.29162676103385]
Re-running pretraining can lead to substantially different conclusions about performance.
We introduce MultiBERTs: a set of 25 BERT-base checkpoints.
The aim is to enable researchers to draw robust and statistically justified conclusions about pretraining procedures.
arXiv Detail & Related papers (2021-06-30T15:56:44Z) - An Empirical Analysis of the R Package Ecosystem [0.0]
We analyze more than 25,000 packages, 150,000 releases, and 15 million files across two decades.
We find that the historical growth of the ecosystem has been robust under all measures.
arXiv Detail & Related papers (2021-02-19T12:55:18Z) - Identifying Statistical Bias in Dataset Replication [102.92137353938388]
We study a replication of the ImageNet dataset on which models exhibit a significant (11-14%) drop in accuracy.
After correcting for the identified statistical bias, only an estimated $3.6% pm 1.5%$ of the original $11.7% pm 1.0%$ accuracy drop remains unaccounted for.
arXiv Detail & Related papers (2020-05-19T17:48:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.