Related papers: Identifying Offline Metrics that Predict Online Impact: A Pragmatic Strategy for Real-World Recommender Systems

Identifying Offline Metrics that Predict Online Impact: A Pragmatic Strategy for Real-World Recommender Systems

URL: http://arxiv.org/abs/2507.09566v1
Date: Sun, 13 Jul 2025 10:24:41 GMT
Title: Identifying Offline Metrics that Predict Online Impact: A Pragmatic Strategy for Real-World Recommender Systems
Authors: Timo Wilm, Philipp Normann,
Abstract summary: We introduce a pragmatic strategy for identifying offline metrics that align with online impact.<n>We validate the strategy through a large-scale online experiment in the field of session-based recommender systems.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: A critical challenge in recommender systems is to establish reliable relationships between offline and online metrics that predict real-world performance. Motivated by recent advances in Pareto front approximation, we introduce a pragmatic strategy for identifying offline metrics that align with online impact. A key advantage of this approach is its ability to simultaneously serve multiple test groups, each with distinct offline performance metrics, in an online experiment controlled by a single model. The method is model-agnostic for systems with a neural network backbone, enabling broad applicability across architectures and domains. We validate the strategy through a large-scale online experiment in the field of session-based recommender systems on the OTTO e-commerce platform. The online experiment identifies significant alignments between offline metrics and real-word click-through rate, post-click conversion rate and units sold. Our strategy provides industry practitioners with a valuable tool for understanding offline-to-online metric relationships and making informed, data-driven decisions.

Related papers

Generative Auto-Bidding with Value-Guided Explorations [47.71346722705783]
This paper introduces a novel offline Generative Auto-bidding framework with Value-Guided Explorations (GAVE)<n> Experimental results on two offline datasets and real-world deployments demonstrate that GAVE outperforms state-of-the-art baselines in both offline evaluations and online A/B tests.
arXiv Detail & Related papers (2025-04-20T12:28:49Z)
Active Advantage-Aligned Online Reinforcement Learning with Offline Data [56.98480620108727]
We introduce A3RL, which incorporates a novel confidence aware Active Advantage Aligned sampling strategy.<n>We demonstrate that our method outperforms competing online RL techniques that leverage offline data.
arXiv Detail & Related papers (2025-02-11T20:31:59Z)
Optimistic Critic Reconstruction and Constrained Fine-Tuning for General Offline-to-Online RL [36.65926744075032]
offline-to-online (O2O) reinforcement learning improves performance rapidly with limited online interactions.<n>Recent studies often design fine-tuning strategies for a specific offline RL method and cannot perform general O2O learning from any offline method.<n>We propose to handle these two mismatches simultaneously, which aims to achieve general O2O learning from any offline method to any online method.
arXiv Detail & Related papers (2024-12-25T09:52:22Z)
Coordination Failure in Cooperative Offline MARL [3.623224034411137]
We focus on coordination failure and investigate the role of joint actions in multi-agent policy gradients with offline data. By using two-player games as an analytical tool, we demonstrate a simple yet overlooked failure mode of BRUD-based algorithms. We propose an approach to mitigate such failure, by prioritising samples from the dataset based on joint-action similarity.
arXiv Detail & Related papers (2024-07-01T14:51:29Z)
Bayesian Design Principles for Offline-to-Online Reinforcement Learning [50.97583504192167]
offline-to-online fine-tuning is crucial for real-world applications where exploration can be costly or unsafe. In this paper, we tackle the dilemma of offline-to-online fine-tuning: if the agent remains pessimistic, it may fail to learn a better policy, while if it becomes optimistic directly, performance may suffer from a sudden drop. We show that Bayesian design principles are crucial in solving such a dilemma.
arXiv Detail & Related papers (2024-05-31T16:31:07Z)
Age of Semantics in Cooperative Communications: To Expedite Simulation Towards Real via Offline Reinforcement Learning [53.18060442931179]
We propose the age of semantics (AoS) for measuring semantics freshness of status updates in a cooperative relay communication system. We derive an online deep actor-critic (DAC) learning scheme under the on-policy temporal difference learning framework. We then put forward a novel offline DAC scheme, which estimates the optimal control policy from a previously collected dataset.
arXiv Detail & Related papers (2022-09-19T11:55:28Z)
Offline Evaluation of Reward-Optimizing Recommender Systems: The Case of Simulation [11.940733431087102]
In academic and industry-based research, online evaluation methods are seen as the golden standard for interactive applications like recommendation systems. Online evaluation methods are costly for a number of reasons, and a clear need remains for reliable offline evaluation procedures. In academic work, limited access to online systems makes offline metrics the de facto approach to validating novel methods.
arXiv Detail & Related papers (2022-09-18T20:03:32Z)
Do Offline Metrics Predict Online Performance in Recommender Systems? [79.48653445643865]
We investigate the extent to which offline metrics predict online performance by evaluating recommenders across six simulated environments. We observe that offline metrics are correlated with online performance over a range of environments. We study the impact of adding exploration strategies, and observe that their effectiveness, when compared to greedy recommendation, is highly dependent on the recommendation algorithm.
arXiv Detail & Related papers (2020-11-07T01:41:13Z)
AliExpress Learning-To-Rank: Maximizing Online Model Performance without Going Online [60.887637616379926]
This paper proposes an evaluator-generator framework for learning-to-rank. It consists of an evaluator that generalizes to evaluate recommendations involving the context, and a generator that maximizes the evaluator score by reinforcement learning. Our method achieves a significant improvement in terms of Conversion Rate (CR) over the industrial-level fine-tuned model in online A/B tests.
arXiv Detail & Related papers (2020-03-25T10:27:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.