Related papers: Offline Reinforcement Learning for Mobile Notifications

Offline Reinforcement Learning for Mobile Notifications

URL: http://arxiv.org/abs/2202.03867v1
Date: Fri, 4 Feb 2022 22:22:22 GMT
Title: Offline Reinforcement Learning for Mobile Notifications
Authors: Yiping Yuan, Ajith Muralidharan, Preetam Nandy, Miao Cheng, Prakruthi Prabhakar
Abstract summary: Mobile notification systems have taken a major role in driving and maintaining user engagement for online platforms. Most machine learning applications in notification systems are built around response-prediction models. We argue that reinforcement learning is a better framework for notification systems in terms of performance and iteration speed.
Score: 1.965345368500676
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Mobile notification systems have taken a major role in driving and maintaining user engagement for online platforms. They are interesting recommender systems to machine learning practitioners with more sequential and long-term feedback considerations. Most machine learning applications in notification systems are built around response-prediction models, trying to attribute both short-term impact and long-term impact to a notification decision. However, a user's experience depends on a sequence of notifications and attributing impact to a single notification is not always accurate, if not impossible. In this paper, we argue that reinforcement learning is a better framework for notification systems in terms of performance and iteration speed. We propose an offline reinforcement learning framework to optimize sequential notification decisions for driving user engagement. We describe a state-marginalized importance sampling policy evaluation approach, which can be used to evaluate the policy offline and tune learning hyperparameters. Through simulations that approximate the notifications ecosystem, we demonstrate the performance and benefits of the offline evaluation approach as a part of the reinforcement learning modeling approach. Finally, we collect data through online exploration in the production system, train an offline Double Deep Q-Network and launch a successful policy online. We also discuss the practical considerations and results obtained by deploying these policies for a large-scale recommendation system use-case.

Related papers

Can Prompt Difficulty be Online Predicted for Accelerating RL Finetuning of Reasoning Models? [62.579951798437115]
This work investigates iterative approximate evaluation for arbitrary prompts.<n>It introduces Model Predictive Prompt Selection (MoPPS), a Bayesian risk-predictive framework.<n>MoPPS reliably predicts prompt difficulty and accelerates training with significantly reduced rollouts.
arXiv Detail & Related papers (2025-07-07T03:20:52Z)
What Matters for Batch Online Reinforcement Learning in Robotics? [65.06558240091758]
The ability to learn from large batches of autonomously collected data for policy improvement holds the promise of enabling truly scalable robot learning.<n>Previous works have applied imitation learning and filtered imitation learning methods to the batch online RL problem.<n>We analyze how these axes affect performance and scaling with the amount of autonomous data.
arXiv Detail & Related papers (2025-05-12T21:24:22Z)
A Closer Look at System Prompt Robustness [2.5525497052179995]
Developers depend on system prompts to specify important context, output format, personalities, guardrails, content policies, and safety countermeasures. In practice, models often forget to consider relevant guardrails or fail to resolve conflicting demands between the system and the user. We create realistic new evaluation and fine-tuning datasets based on prompts collected from OpenAI's GPT Store and HuggingFace's HuggingChat.
arXiv Detail & Related papers (2025-02-15T18:10:45Z)
System-2 Recommenders: Disentangling Utility and Engagement in Recommendation Systems via Temporal Point-Processes [80.97898201876592]
We propose a generative model in which past content interactions impact the arrival rates of users based on a self-exciting Hawkes process. We show analytically that given samples it is possible to disentangle System-1 and System-2 and allow content optimization based on user utility.
arXiv Detail & Related papers (2024-05-29T18:19:37Z)
Enhancing End-to-End Multi-Task Dialogue Systems: A Study on Intrinsic Motivation Reinforcement Learning Algorithms for Improved Training and Adaptability [1.0985060632689174]
Investigating intrinsic motivation reinforcement learning algorithms is the goal of this study. We adapt techniques for random network distillation and curiosity-driven reinforcement learning to measure the frequency of state visits. Experimental results on MultiWOZ, a heterogeneous dataset, show that intrinsic motivation-based debate systems outperform policies that depend on extrinsic incentives.
arXiv Detail & Related papers (2024-01-31T18:03:39Z)
Online Matching: A Real-time Bandit System for Large-scale Recommendations [23.954049092470548]
Online Matching is a scalable closed-loop bandit system learning from users' direct feedback on items in real time. Diag-LinUCB is a novel extension of the LinUCB algorithm to enable distributed updates of bandits parameter in a scalable and timely manner.
arXiv Detail & Related papers (2023-07-29T05:46:27Z)
Age of Semantics in Cooperative Communications: To Expedite Simulation Towards Real via Offline Reinforcement Learning [53.18060442931179]
We propose the age of semantics (AoS) for measuring semantics freshness of status updates in a cooperative relay communication system. We derive an online deep actor-critic (DAC) learning scheme under the on-policy temporal difference learning framework. We then put forward a novel offline DAC scheme, which estimates the optimal control policy from a previously collected dataset.
arXiv Detail & Related papers (2022-09-19T11:55:28Z)
Multi-objective Optimization of Notifications Using Offline Reinforcement Learning [1.2303635283131926]
We formulate the near-real-time notification decision problem as a Markov Decision Process. We propose an end-to-end offline reinforcement learning framework to optimize sequential notification decisions.
arXiv Detail & Related papers (2022-07-07T00:53:08Z)
Offline Preference-Based Apprenticeship Learning [11.21888613165599]
We study how an offline dataset can be used to address two challenges that autonomous systems face when they endeavor to learn from, adapt to, and collaborate with humans. First, we use the offline dataset to efficiently infer the human's reward function via pool-based active preference learning. Second, given this learned reward function, we perform offline reinforcement learning to optimize a policy based on the inferred human intent.
arXiv Detail & Related papers (2021-07-20T04:15:52Z)
A Deep Value-network Based Approach for Multi-Driver Order Dispatching [55.36656442934531]
We propose a deep reinforcement learning based solution for order dispatching. We conduct large scale online A/B tests on DiDi's ride-dispatching platform. Results show that CVNet consistently outperforms other recently proposed dispatching methods.
arXiv Detail & Related papers (2021-06-08T16:27:04Z)
Do Offline Metrics Predict Online Performance in Recommender Systems? [79.48653445643865]
We investigate the extent to which offline metrics predict online performance by evaluating recommenders across six simulated environments. We observe that offline metrics are correlated with online performance over a range of environments. We study the impact of adding exploration strategies, and observe that their effectiveness, when compared to greedy recommendation, is highly dependent on the recommendation algorithm.
arXiv Detail & Related papers (2020-11-07T01:41:13Z)
Improving Conversational Question Answering Systems after Deployment using Feedback-Weighted Learning [69.42679922160684]
We propose feedback-weighted learning based on importance sampling to improve upon an initial supervised system using binary user feedback. Our work opens the prospect to exploit interactions with real users and improve conversational systems after deployment.
arXiv Detail & Related papers (2020-11-01T19:50:34Z)
Modeling Online Behavior in Recommender Systems: The Importance of Temporal Context [30.894950420437926]
We show how omitting temporal context when evaluating recommender system performance leads to false confidence. We propose a training procedure to further embed the temporal context in existing models. Results show that including our temporal objective can improve recall@20 by up to 20%.
arXiv Detail & Related papers (2020-09-19T19:36:43Z)
Empowering Active Learning to Jointly Optimize System and User Demands [70.66168547821019]
We propose a new active learning approach that jointly optimize the active learning system (training efficiently) and the user (receiving useful instances) We study our approach in an educational application, which particularly benefits from this technique as the system needs to rapidly learn to predict the appropriateness of an exercise to a particular user. We evaluate multiple learning strategies and user types with data from real users and find that our joint approach better satisfies both objectives when alternative methods lead to many unsuitable exercises for end users.
arXiv Detail & Related papers (2020-05-09T16:02:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.