Reproducibility and Artifact Consistency of the SIGIR 2022 Recommender Systems Papers Based on Message Passing
- URL: http://arxiv.org/abs/2503.07823v1
- Date: Mon, 10 Mar 2025 20:09:04 GMT
- Title: Reproducibility and Artifact Consistency of the SIGIR 2022 Recommender Systems Papers Based on Message Passing
- Authors: Maurizio Ferrari Dacrema, Michael Benigni, Nicola Ferro,
- Abstract summary: We analyze 10 graph-based Recommender Systems papers presented at SIGIR 2022 and 2023.<n>Our analysis reveals several critical points that require attention.<n>Due to these issues, we are unable to confirm the claims made in most of the papers we examined and attempted to reproduce.
- Score: 10.474191156751928
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Graph-based techniques relying on neural networks and embeddings have gained attention as a way to develop Recommender Systems (RS) with several papers on the topic presented at SIGIR 2022 and 2023. Given the importance of ensuring that published research is methodologically sound and reproducible, in this paper we analyze 10 graph-based RS papers, most of which were published at SIGIR 2022, and assess their impact on subsequent work published in SIGIR 2023. Our analysis reveals several critical points that require attention: (i) the prevalence of bad practices, such as erroneous data splits or information leakage between training and testing data, which call into question the validity of the results; (ii) frequent inconsistencies between the provided artifacts (source code and data) and their descriptions in the paper, causing uncertainty about what is actually being evaluated; and (iii) the preference for new or complex baselines that are weaker compared to simpler ones, creating the impression of continuous improvement even when, particularly for the Amazon-Book dataset, the state-of-the-art has significantly worsened. Due to these issues, we are unable to confirm the claims made in most of the papers we examined and attempted to reproduce.
Related papers
- Retrieval-Augmented Generation with Conflicting Evidence [57.66282463340297]
Large language model (LLM) agents are increasingly employing retrieval-augmented generation (RAG) to improve the factuality of their responses.
In practice, these systems often need to handle ambiguous user queries and potentially conflicting information from multiple sources.
We propose RAMDocs (Retrieval with Ambiguity and Misinformation in Documents), a new dataset that simulates complex and realistic scenarios for conflicting evidence for a user query.
arXiv Detail & Related papers (2025-04-17T16:46:11Z) - WithdrarXiv: A Large-Scale Dataset for Retraction Study [33.782357627001154]
We present WithdrarXiv, the first large-scale dataset of withdrawn papers from arXiv.<n>We develop a comprehensive taxonomy of retraction reasons, identifying 10 distinct categories ranging from critical errors to policy violations.<n>We demonstrate a simple yet highly accurate zero-shot automatic categorization of retraction reasons, achieving a weighted average F1-score of 0.96.
arXiv Detail & Related papers (2024-12-04T23:36:23Z) - Analysis of the ICML 2023 Ranking Data: Can Authors' Opinions of Their Own Papers Assist Peer Review in Machine Learning? [52.00419656272129]
We conducted an experiment during the 2023 International Conference on Machine Learning (ICML)
We received 1,342 rankings, each from a distinct author, pertaining to 2,592 submissions.
We focus on the Isotonic Mechanism, which calibrates raw review scores using author-provided rankings.
arXiv Detail & Related papers (2024-08-24T01:51:23Z) - Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews [51.453135368388686]
We present an approach for estimating the fraction of text in a large corpus which is likely to be substantially modified or produced by a large language model (LLM)
Our maximum likelihood model leverages expert-written and AI-generated reference texts to accurately and efficiently examine real-world LLM-use at the corpus level.
arXiv Detail & Related papers (2024-03-11T21:51:39Z) - DAGnosis: Localized Identification of Data Inconsistencies using
Structures [73.39285449012255]
Identification and appropriate handling of inconsistencies in data at deployment time is crucial to reliably use machine learning models.
We use directed acyclic graphs (DAGs) to encode the training set's features probability distribution and independencies as a structure.
Our method, called DAGnosis, leverages these structural interactions to bring valuable and insightful data-centric conclusions.
arXiv Detail & Related papers (2024-02-26T11:29:16Z) - Investigating Reproducibility in Deep Learning-Based Software Fault
Prediction [16.25827159504845]
With the rapid adoption of increasingly complex machine learning models, it becomes more and more difficult for scholars to reproduce the results that are reported in the literature.
This is in particular the case when the applied deep learning models and the evaluation methodology are not properly documented and when code and data are not shared.
We have conducted a systematic review of the current literature and examined the level of 56 research articles that were published between 2019 and 2022 in top-tier software engineering conferences.
arXiv Detail & Related papers (2024-02-08T13:00:18Z) - Too Good To Be True: performance overestimation in (re)current practices
for Human Activity Recognition [49.1574468325115]
sliding windows for data segmentation followed by standard random k-fold cross validation produce biased results.
It is important to raise awareness in the scientific community about this problem, whose negative effects are being overlooked.
Several experiments with different types of datasets and different types of classification models allow us to exhibit the problem and show it persists independently of the method or dataset.
arXiv Detail & Related papers (2023-10-18T13:24:05Z) - Deconstructing Self-Supervised Monocular Reconstruction: The Design
Decisions that Matter [63.5550818034739]
This paper presents a framework to evaluate state-of-the-art contributions to self-supervised monocular depth estimation.
It includes pretraining, backbone, architectural design choices and loss functions.
We re-implement, validate and re-evaluate 16 state-of-the-art contributions and introduce a new dataset.
arXiv Detail & Related papers (2022-08-02T14:38:53Z) - Matching Papers and Reviewers at Large Conferences [25.79501640609188]
This paper studies a novel reviewer-paper matching approach that was recently deployed in the 35th AAAI Conference on Artificial Intelligence (AAAI 2021)
This approach has three main elements: (1) collecting and processing input data to identify problematic matches and generate reviewer-paper scores; (2) formulating and solving an optimization problem to find good reviewer-paper matchings; and (3) the introduction of a novel, two-phase reviewing process that shifted reviewing resources away from papers likely to be rejected and towards papers closer to the decision boundary.
arXiv Detail & Related papers (2022-02-24T18:13:43Z) - Small data problems in political research: a critical replication study [5.698280399449707]
We show that the small data causes the classification model to be highly sensitive to variations in the random train-test split.
We also show that the applied preprocessing causes the data to be extremely sparse.
Based on our findings, we argue that A&W's conclusions regarding the automated classification of organizational reputation tweets can not be maintained.
arXiv Detail & Related papers (2021-09-27T09:55:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.