Do Offline Metrics Predict Online Performance in Recommender Systems?
- URL: http://arxiv.org/abs/2011.07931v1
- Date: Sat, 7 Nov 2020 01:41:13 GMT
- Title: Do Offline Metrics Predict Online Performance in Recommender Systems?
- Authors: Karl Krauth, Sarah Dean, Alex Zhao, Wenshuo Guo, Mihaela Curmei,
Benjamin Recht, Michael I. Jordan
- Abstract summary: We investigate the extent to which offline metrics predict online performance by evaluating recommenders across six simulated environments.
We observe that offline metrics are correlated with online performance over a range of environments.
We study the impact of adding exploration strategies, and observe that their effectiveness, when compared to greedy recommendation, is highly dependent on the recommendation algorithm.
- Score: 79.48653445643865
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recommender systems operate in an inherently dynamical setting. Past
recommendations influence future behavior, including which data points are
observed and how user preferences change. However, experimenting in production
systems with real user dynamics is often infeasible, and existing
simulation-based approaches have limited scale. As a result, many
state-of-the-art algorithms are designed to solve supervised learning problems,
and progress is judged only by offline metrics. In this work we investigate the
extent to which offline metrics predict online performance by evaluating eleven
recommenders across six controlled simulated environments. We observe that
offline metrics are correlated with online performance over a range of
environments. However, improvements in offline metrics lead to diminishing
returns in online performance. Furthermore, we observe that the ranking of
recommenders varies depending on the amount of initial offline data available.
We study the impact of adding exploration strategies, and observe that their
effectiveness, when compared to greedy recommendation, is highly dependent on
the recommendation algorithm. We provide the environments and recommenders
described in this paper as Reclab: an extensible ready-to-use simulation
framework at https://github.com/berkeley-reclab/RecLab.
Related papers
- Understanding the performance gap between online and offline alignment algorithms [63.137832242488926]
We show that offline algorithms train policy to become good at pairwise classification, while online algorithms are good at generations.
This hints at a unique interplay between discriminative and generative capabilities, which is greatly impacted by the sampling process.
Our study sheds light on the pivotal role of on-policy sampling in AI alignment, and hints at certain fundamental challenges of offline alignment algorithms.
arXiv Detail & Related papers (2024-05-14T09:12:30Z) - Bridging Offline-Online Evaluation with a Time-dependent and Popularity
Bias-free Offline Metric for Recommenders [3.130722489512822]
We show that penalizing popular items and considering the time of transactions significantly improves our ability to choose the best recommendation model for a live recommender system.
Our results aim to help the academic community to understand better offline evaluation and optimization criteria that are more relevant for real applications of recommender systems.
arXiv Detail & Related papers (2023-08-14T01:37:02Z) - Efficient Online Reinforcement Learning with Offline Data [78.92501185886569]
We show that we can simply apply existing off-policy methods to leverage offline data when learning online.
We extensively ablate these design choices, demonstrating the key factors that most affect performance.
We see that correct application of these simple recommendations can provide a $mathbf2.5times$ improvement over existing approaches.
arXiv Detail & Related papers (2023-02-06T17:30:22Z) - Benchmarks and Algorithms for Offline Preference-Based Reward Learning [41.676208473752425]
We propose an approach that uses an offline dataset to craft preference queries via pool-based active learning.
Our proposed approach does not require actual physical rollouts or an accurate simulator for either the reward learning or policy optimization steps.
arXiv Detail & Related papers (2023-01-03T23:52:16Z) - Towards Data-Driven Offline Simulations for Online Reinforcement
Learning [30.654163861164864]
We formalize offline learner simulation (OLS) for reinforcement learning (RL)
We propose a novel evaluation protocol that measures both fidelity and efficiency of the simulation.
arXiv Detail & Related papers (2022-11-14T18:36:13Z) - Offline Evaluation of Reward-Optimizing Recommender Systems: The Case of
Simulation [11.940733431087102]
In academic and industry-based research, online evaluation methods are seen as the golden standard for interactive applications like recommendation systems.
Online evaluation methods are costly for a number of reasons, and a clear need remains for reliable offline evaluation procedures.
In academic work, limited access to online systems makes offline metrics the de facto approach to validating novel methods.
arXiv Detail & Related papers (2022-09-18T20:03:32Z) - Improving Long-Term Metrics in Recommendation Systems using
Short-Horizon Offline RL [56.20835219296896]
We study session-based recommendation scenarios where we want to recommend items to users during sequential interactions to improve their long-term utility.
We develop a new batch RL algorithm called Short Horizon Policy Improvement (SHPI) that approximates policy-induced distribution shifts across sessions.
arXiv Detail & Related papers (2021-06-01T15:58:05Z) - OPAL: Offline Primitive Discovery for Accelerating Offline Reinforcement
Learning [107.6943868812716]
In many practical applications, the situation is reversed: an agent may have access to large amounts of undirected offline experience data, while access to the online environment is severely limited.
Our main insight is that, when presented with offline data composed of a variety of behaviors, an effective way to leverage this data is to extract a continuous space of recurring and temporally extended primitive behaviors.
In addition to benefiting offline policy optimization, we show that performing offline primitive learning in this way can also be leveraged for improving few-shot imitation learning.
arXiv Detail & Related papers (2020-10-26T14:31:08Z) - Modeling Online Behavior in Recommender Systems: The Importance of
Temporal Context [30.894950420437926]
We show how omitting temporal context when evaluating recommender system performance leads to false confidence.
We propose a training procedure to further embed the temporal context in existing models.
Results show that including our temporal objective can improve recall@20 by up to 20%.
arXiv Detail & Related papers (2020-09-19T19:36:43Z) - AliExpress Learning-To-Rank: Maximizing Online Model Performance without
Going Online [60.887637616379926]
This paper proposes an evaluator-generator framework for learning-to-rank.
It consists of an evaluator that generalizes to evaluate recommendations involving the context, and a generator that maximizes the evaluator score by reinforcement learning.
Our method achieves a significant improvement in terms of Conversion Rate (CR) over the industrial-level fine-tuned model in online A/B tests.
arXiv Detail & Related papers (2020-03-25T10:27:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.