CVTT: Cross-Validation Through Time
- URL: http://arxiv.org/abs/2205.05393v1
- Date: Wed, 11 May 2022 10:30:38 GMT
- Title: CVTT: Cross-Validation Through Time
- Authors: Sergey Kolesnikov, Mikhail Andronov
- Abstract summary: We argue that leaving out a method's continuous performance can lead to losing valuable insight into joint data-method effects.
Using the proposed technique, we conduct a detailed analysis of popular RecSys algorithms' performance against various metrics and datasets.
Our results show that model performance can vary significantly over time, and both data and evaluation setup can have a marked effect on it.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The practical aspects of evaluating recommender systems is an actively
discussed topic in the research community. While many current evaluation
techniques bring performance down to a single-value metric as a straightforward
approach for model comparison, it is based on a strong assumption of the
methods' stable performance over time. In this paper, we argue that leaving out
a method's continuous performance can lead to losing valuable insight into
joint data-method effects. We propose the Cross-Validation Thought Time (CVTT)
technique to perform more detailed evaluations, which focus on model
cross-validation performance over time. Using the proposed technique, we
conduct a detailed analysis of popular RecSys algorithms' performance against
various metrics and datasets. We also compare several data preparation and
evaluation strategies to analyze their impact on model performance. Our results
show that model performance can vary significantly over time, and both data and
evaluation setup can have a marked effect on it.
Related papers
- Revisiting BPR: A Replicability Study of a Common Recommender System Baseline [78.00363373925758]
We study the features of the BPR model, indicating their impact on its performance, and investigate open-source BPR implementations.
Our analysis reveals inconsistencies between these implementations and the original BPR paper, leading to a significant decrease in performance of up to 50% for specific implementations.
We show that the BPR model can achieve performance levels close to state-of-the-art methods on the top-n recommendation tasks and even outperform them on specific datasets.
arXiv Detail & Related papers (2024-09-21T18:39:53Z) - Explanatory Model Monitoring to Understand the Effects of Feature Shifts on Performance [61.06245197347139]
We propose a novel approach to explain the behavior of a black-box model under feature shifts.
We refer to our method that combines concepts from Optimal Transport and Shapley Values as Explanatory Performance Estimation.
arXiv Detail & Related papers (2024-08-24T18:28:19Z) - Improving the Evaluation and Actionability of Explanation Methods for Multivariate Time Series Classification [4.588028371034407]
We focus on analyzing InterpretTime, a recent evaluation methodology for attribution methods applied to MTSC.
We showcase some significant weaknesses of the original methodology and propose ideas to improve its accuracy and efficiency.
We find that perturbation-based methods such as SHAP and Feature Ablation work well across a set of datasets.
arXiv Detail & Related papers (2024-06-18T11:18:46Z) - Distilled Datamodel with Reverse Gradient Matching [74.75248610868685]
We introduce an efficient framework for assessing data impact, comprising offline training and online evaluation stages.
Our proposed method achieves comparable model behavior evaluation while significantly speeding up the process compared to the direct retraining method.
arXiv Detail & Related papers (2024-04-22T09:16:14Z) - From Variability to Stability: Advancing RecSys Benchmarking Practices [3.3331198926331784]
This paper introduces a novel benchmarking methodology to facilitate a fair and robust comparison of RecSys algorithms.
By utilizing a diverse set of $30$ open datasets, including two introduced in this work, we critically examine the influence of dataset characteristics on algorithm performance.
arXiv Detail & Related papers (2024-02-15T07:35:52Z) - A Large-Scale Empirical Study on Improving the Fairness of Image Classification Models [22.522156479335706]
This paper conducts the first large-scale empirical study to compare the performance of existing state-of-the-art fairness improving techniques.
Our findings reveal substantial variations in the performance of each method across different datasets and sensitive attributes.
Different fairness evaluation metrics, due to their distinct focuses, yield significantly different assessment results.
arXiv Detail & Related papers (2024-01-08T06:53:33Z) - A Call to Reflect on Evaluation Practices for Age Estimation: Comparative Analysis of the State-of-the-Art and a Unified Benchmark [2.156208381257605]
We offer an extensive comparative analysis for state-of-the-art facial age estimation methods.
We find that the performance differences between the methods are negligible compared to the effect of other factors.
We propose using FaRL as the backbone model and demonstrate its effectiveness on all public datasets.
arXiv Detail & Related papers (2023-07-10T14:02:31Z) - Better Understanding Differences in Attribution Methods via Systematic Evaluations [57.35035463793008]
Post-hoc attribution methods have been proposed to identify image regions most influential to the models' decisions.
We propose three novel evaluation schemes to more reliably measure the faithfulness of those methods.
We use these evaluation schemes to study strengths and shortcomings of some widely used attribution methods over a wide range of models.
arXiv Detail & Related papers (2023-03-21T14:24:58Z) - Systematic Evaluation of Predictive Fairness [60.0947291284978]
Mitigating bias in training on biased datasets is an important open problem.
We examine the performance of various debiasing methods across multiple tasks.
We find that data conditions have a strong influence on relative model performance.
arXiv Detail & Related papers (2022-10-17T05:40:13Z) - DEALIO: Data-Efficient Adversarial Learning for Imitation from
Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator.
Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms.
This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk.
We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z) - A Critical Assessment of State-of-the-Art in Entity Alignment [1.7725414095035827]
We investigate two state-of-the-art (SotA) methods for the task of Entity Alignment in Knowledge Graphs.
We first carefully examine the benchmarking process and identify several shortcomings, which make the results reported in the original works not always comparable.
arXiv Detail & Related papers (2020-10-30T15:09:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.