RBoard: A Unified Platform for Reproducible and Reusable Recommender System Benchmarks
- URL: http://arxiv.org/abs/2409.05526v2
- Date: Tue, 10 Sep 2024 16:46:10 GMT
- Title: RBoard: A Unified Platform for Reproducible and Reusable Recommender System Benchmarks
- Authors: Xinyang Shao, Edoardo D'Amico, Gabor Fodor, Tri Kurniawan Wijaya,
- Abstract summary: RBoard is a novel framework for benchmarking recommender systems.
It provides a comprehensive platform for benchmarking diverse recommendation tasks, including CTR prediction, Top-N recommendation, and others.
The framework evaluates algorithms across multiple datasets within each task, aggregating results for a holistic performance assessment.
- Score: 0.4312340306206883
- License:
- Abstract: Recommender systems research lacks standardized benchmarks for reproducibility and algorithm comparisons. We introduce RBoard, a novel framework addressing these challenges by providing a comprehensive platform for benchmarking diverse recommendation tasks, including CTR prediction, Top-N recommendation, and others. RBoard's primary objective is to enable fully reproducible and reusable experiments across these scenarios. The framework evaluates algorithms across multiple datasets within each task, aggregating results for a holistic performance assessment. It implements standardized evaluation protocols, ensuring consistency and comparability. To facilitate reproducibility, all user-provided code can be easily downloaded and executed, allowing researchers to reliably replicate studies and build upon previous work. By offering a unified platform for rigorous, reproducible evaluation across various recommendation scenarios, RBoard aims to accelerate progress in the field and establish a new standard for recommender systems benchmarking in both academia and industry. The platform is available at https://rboard.org and the demo video can be found at https://bit.ly/rboard-demo.
Related papers
- UniTTA: Unified Benchmark and Versatile Framework Towards Realistic Test-Time Adaptation [66.05528698010697]
Test-Time Adaptation aims to adapt pre-trained models to the target domain during testing.
Researchers have identified various challenging scenarios and developed diverse methods to address these challenges.
We propose a Unified Test-Time Adaptation benchmark, which is comprehensive and widely applicable.
arXiv Detail & Related papers (2024-07-29T15:04:53Z) - ReFeR: Improving Evaluation and Reasoning through Hierarchy of Models [12.035509884945789]
We introduce a tuning-free framework called ReFeR, designed to evaluate generative outputs, including both text and images.
We rigorously evaluate our framework, ReFeR, across four diverse evaluation tasks.
Experiments on four reasoning tasks demonstrate superior collective reasoning abilities of the framework.
arXiv Detail & Related papers (2024-07-16T08:25:26Z) - Top-K Pairwise Ranking: Bridging the Gap Among Ranking-Based Measures for Multi-Label Classification [120.37051160567277]
This paper proposes a novel measure named Top-K Pairwise Ranking (TKPR)
A series of analyses show that TKPR is compatible with existing ranking-based measures.
On the other hand, we establish a sharp generalization bound for the proposed framework based on a novel technique named data-dependent contraction.
arXiv Detail & Related papers (2024-07-09T09:36:37Z) - Summarization from Leaderboards to Practice: Choosing A Representation
Backbone and Ensuring Robustness [21.567112955050582]
In both automatic and human evaluation, BART performs better than PEG and T5.
We find considerable variation in system output that can be captured only with human evaluation.
arXiv Detail & Related papers (2023-06-18T13:35:41Z) - Vote'n'Rank: Revision of Benchmarking with Social Choice Theory [7.224599819499157]
This paper proposes Vote'n'Rank, a framework for ranking systems in multi-task benchmarks under the principles of the social choice theory.
We demonstrate that our approach can be efficiently utilised to draw new insights on benchmarking in several ML sub-fields.
arXiv Detail & Related papers (2022-10-11T20:19:11Z) - RGRecSys: A Toolkit for Robustness Evaluation of Recommender Systems [100.54655931138444]
We propose a more holistic view of robustness for recommender systems that encompasses multiple dimensions.
We present a robustness evaluation toolkit, Robustness Gym for RecSys, that allows us to quickly and uniformly evaluate the robustness of recommender system models.
arXiv Detail & Related papers (2022-01-12T10:32:53Z) - Dynaboard: An Evaluation-As-A-Service Platform for Holistic
Next-Generation Benchmarking [41.99715850562528]
We introduce Dynaboard, an evaluation-as-a-service framework for hosting benchmarks and conducting holistic model comparison.
Our platform evaluates NLP models directly instead of relying on self-reported metrics or predictions on a single dataset.
arXiv Detail & Related papers (2021-05-21T01:17:52Z) - USACv20: robust essential, fundamental and homography matrix estimation [68.65610177368617]
We review the most recent RANSAC-like hypothesize-and-verify robust estimators.
The best performing ones are combined to create a state-of-the-art version of the Universal Sample Consensus (USAC) algorithm.
A proposed method, USACv20, is tested on eight publicly available real-world datasets.
arXiv Detail & Related papers (2021-04-11T16:27:02Z) - CRACT: Cascaded Regression-Align-Classification for Robust Visual
Tracking [97.84109669027225]
We introduce an improved proposal refinement module, Cascaded Regression-Align- Classification (CRAC)
CRAC yields new state-of-the-art performances on many benchmarks.
In experiments on seven benchmarks including OTB-2015, UAV123, NfS, VOT-2018, TrackingNet, GOT-10k and LaSOT, our CRACT exhibits very promising results in comparison with state-of-the-art competitors.
arXiv Detail & Related papers (2020-11-25T02:18:33Z) - Controllable Multi-Interest Framework for Recommendation [64.30030600415654]
We formalize the recommender system as a sequential recommendation problem.
We propose a novel controllable multi-interest framework for the sequential recommendation, called ComiRec.
Our framework has been successfully deployed on the offline Alibaba distributed cloud platform.
arXiv Detail & Related papers (2020-05-19T10:18:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.