Offline Reinforcement Learning for Mobile Notifications
- URL: http://arxiv.org/abs/2202.03867v1
- Date: Fri, 4 Feb 2022 22:22:22 GMT
- Title: Offline Reinforcement Learning for Mobile Notifications
- Authors: Yiping Yuan, Ajith Muralidharan, Preetam Nandy, Miao Cheng, Prakruthi
Prabhakar
- Abstract summary: Mobile notification systems have taken a major role in driving and maintaining user engagement for online platforms.
Most machine learning applications in notification systems are built around response-prediction models.
We argue that reinforcement learning is a better framework for notification systems in terms of performance and iteration speed.
- Score: 1.965345368500676
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Mobile notification systems have taken a major role in driving and
maintaining user engagement for online platforms. They are interesting
recommender systems to machine learning practitioners with more sequential and
long-term feedback considerations. Most machine learning applications in
notification systems are built around response-prediction models, trying to
attribute both short-term impact and long-term impact to a notification
decision. However, a user's experience depends on a sequence of notifications
and attributing impact to a single notification is not always accurate, if not
impossible. In this paper, we argue that reinforcement learning is a better
framework for notification systems in terms of performance and iteration speed.
We propose an offline reinforcement learning framework to optimize sequential
notification decisions for driving user engagement. We describe a
state-marginalized importance sampling policy evaluation approach, which can be
used to evaluate the policy offline and tune learning hyperparameters. Through
simulations that approximate the notifications ecosystem, we demonstrate the
performance and benefits of the offline evaluation approach as a part of the
reinforcement learning modeling approach. Finally, we collect data through
online exploration in the production system, train an offline Double Deep
Q-Network and launch a successful policy online. We also discuss the practical
considerations and results obtained by deploying these policies for a
large-scale recommendation system use-case.
Related papers
- System-2 Recommenders: Disentangling Utility and Engagement in Recommendation Systems via Temporal Point-Processes [80.97898201876592]
We propose a generative model in which past content interactions impact the arrival rates of users based on a self-exciting Hawkes process.
We show analytically that given samples it is possible to disentangle System-1 and System-2 and allow content optimization based on user utility.
arXiv Detail & Related papers (2024-05-29T18:19:37Z) - Enhancing End-to-End Multi-Task Dialogue Systems: A Study on Intrinsic Motivation Reinforcement Learning Algorithms for Improved Training and Adaptability [1.0985060632689174]
Investigating intrinsic motivation reinforcement learning algorithms is the goal of this study.
We adapt techniques for random network distillation and curiosity-driven reinforcement learning to measure the frequency of state visits.
Experimental results on MultiWOZ, a heterogeneous dataset, show that intrinsic motivation-based debate systems outperform policies that depend on extrinsic incentives.
arXiv Detail & Related papers (2024-01-31T18:03:39Z) - Online Matching: A Real-time Bandit System for Large-scale
Recommendations [23.954049092470548]
Online Matching is a scalable closed-loop bandit system learning from users' direct feedback on items in real time.
Diag-LinUCB is a novel extension of the LinUCB algorithm to enable distributed updates of bandits parameter in a scalable and timely manner.
arXiv Detail & Related papers (2023-07-29T05:46:27Z) - Age of Semantics in Cooperative Communications: To Expedite Simulation
Towards Real via Offline Reinforcement Learning [53.18060442931179]
We propose the age of semantics (AoS) for measuring semantics freshness of status updates in a cooperative relay communication system.
We derive an online deep actor-critic (DAC) learning scheme under the on-policy temporal difference learning framework.
We then put forward a novel offline DAC scheme, which estimates the optimal control policy from a previously collected dataset.
arXiv Detail & Related papers (2022-09-19T11:55:28Z) - Multi-objective Optimization of Notifications Using Offline
Reinforcement Learning [1.2303635283131926]
We formulate the near-real-time notification decision problem as a Markov Decision Process.
We propose an end-to-end offline reinforcement learning framework to optimize sequential notification decisions.
arXiv Detail & Related papers (2022-07-07T00:53:08Z) - Offline Preference-Based Apprenticeship Learning [11.21888613165599]
We study how an offline dataset can be used to address two challenges that autonomous systems face when they endeavor to learn from, adapt to, and collaborate with humans.
First, we use the offline dataset to efficiently infer the human's reward function via pool-based active preference learning.
Second, given this learned reward function, we perform offline reinforcement learning to optimize a policy based on the inferred human intent.
arXiv Detail & Related papers (2021-07-20T04:15:52Z) - A Deep Value-network Based Approach for Multi-Driver Order Dispatching [55.36656442934531]
We propose a deep reinforcement learning based solution for order dispatching.
We conduct large scale online A/B tests on DiDi's ride-dispatching platform.
Results show that CVNet consistently outperforms other recently proposed dispatching methods.
arXiv Detail & Related papers (2021-06-08T16:27:04Z) - Do Offline Metrics Predict Online Performance in Recommender Systems? [79.48653445643865]
We investigate the extent to which offline metrics predict online performance by evaluating recommenders across six simulated environments.
We observe that offline metrics are correlated with online performance over a range of environments.
We study the impact of adding exploration strategies, and observe that their effectiveness, when compared to greedy recommendation, is highly dependent on the recommendation algorithm.
arXiv Detail & Related papers (2020-11-07T01:41:13Z) - Improving Conversational Question Answering Systems after Deployment
using Feedback-Weighted Learning [69.42679922160684]
We propose feedback-weighted learning based on importance sampling to improve upon an initial supervised system using binary user feedback.
Our work opens the prospect to exploit interactions with real users and improve conversational systems after deployment.
arXiv Detail & Related papers (2020-11-01T19:50:34Z) - Modeling Online Behavior in Recommender Systems: The Importance of
Temporal Context [30.894950420437926]
We show how omitting temporal context when evaluating recommender system performance leads to false confidence.
We propose a training procedure to further embed the temporal context in existing models.
Results show that including our temporal objective can improve recall@20 by up to 20%.
arXiv Detail & Related papers (2020-09-19T19:36:43Z) - Empowering Active Learning to Jointly Optimize System and User Demands [70.66168547821019]
We propose a new active learning approach that jointly optimize the active learning system (training efficiently) and the user (receiving useful instances)
We study our approach in an educational application, which particularly benefits from this technique as the system needs to rapidly learn to predict the appropriateness of an exercise to a particular user.
We evaluate multiple learning strategies and user types with data from real users and find that our joint approach better satisfies both objectives when alternative methods lead to many unsuitable exercises for end users.
arXiv Detail & Related papers (2020-05-09T16:02:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.