Multi-objective Optimization of Notifications Using Offline
Reinforcement Learning
- URL: http://arxiv.org/abs/2207.03029v1
- Date: Thu, 7 Jul 2022 00:53:08 GMT
- Title: Multi-objective Optimization of Notifications Using Offline
Reinforcement Learning
- Authors: Prakruthi Prabhakar, Yiping Yuan, Guangyu Yang, Wensheng Sun, Ajith
Muralidharan
- Abstract summary: We formulate the near-real-time notification decision problem as a Markov Decision Process.
We propose an end-to-end offline reinforcement learning framework to optimize sequential notification decisions.
- Score: 1.2303635283131926
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Mobile notification systems play a major role in a variety of applications to
communicate, send alerts and reminders to the users to inform them about news,
events or messages. In this paper, we formulate the near-real-time notification
decision problem as a Markov Decision Process where we optimize for multiple
objectives in the rewards. We propose an end-to-end offline reinforcement
learning framework to optimize sequential notification decisions. We address
the challenge of offline learning using a Double Deep Q-network method based on
Conservative Q-learning that mitigates the distributional shift problem and
Q-value overestimation. We illustrate our fully-deployed system and demonstrate
the performance and benefits of the proposed approach through both offline and
online experiments.
Related papers
- AI Flow at the Network Edge [58.31090055138711]
AI Flow is a framework that streamlines the inference process by jointly leveraging the heterogeneous resources available across devices, edge nodes, and cloud servers.
This article serves as a position paper for identifying the motivation, challenges, and principles of AI Flow.
arXiv Detail & Related papers (2024-11-19T12:51:17Z) - Slicing for AI: An Online Learning Framework for Network Slicing Supporting AI Services [5.80147190706865]
6G networks will embrace a new realm of AI-driven services that requires innovative network slicing strategies.
This paper proposes an online learning framework to optimize the allocation of computational and communication resources to AI services.
arXiv Detail & Related papers (2024-10-20T14:38:54Z) - QPO: Query-dependent Prompt Optimization via Multi-Loop Offline Reinforcement Learning [58.767866109043055]
We introduce Query-dependent Prompt Optimization (QPO), which iteratively fine-tune a small pretrained language model to generate optimal prompts tailored to the input queries.
We derive insights from offline prompting demonstration data, which already exists in large quantities as a by-product of benchmarking diverse prompts on open-sourced tasks.
Experiments on various LLM scales and diverse NLP and math tasks demonstrate the efficacy and cost-efficiency of our method in both zero-shot and few-shot scenarios.
arXiv Detail & Related papers (2024-08-20T03:06:48Z) - A Semantic-Aware Multiple Access Scheme for Distributed, Dynamic 6G-Based Applications [14.51946231794179]
This paper introduces a novel formulation for the problem of multiple access to the wireless spectrum.
It aims to optimize the utilization-fairness trade-off, using the $alpha$-fairness metric.
A Semantic-Aware Multi-Agent Double and Dueling Deep Q-Learning (SAMA-D3QL) technique is proposed.
arXiv Detail & Related papers (2024-01-12T00:32:38Z) - Query-Dependent Prompt Evaluation and Optimization with Offline Inverse
RL [62.824464372594576]
We aim to enhance arithmetic reasoning ability of Large Language Models (LLMs) through zero-shot prompt optimization.
We identify a previously overlooked objective of query dependency in such optimization.
We introduce Prompt-OIRL, which harnesses offline inverse reinforcement learning to draw insights from offline prompting demonstration data.
arXiv Detail & Related papers (2023-09-13T01:12:52Z) - Age of Semantics in Cooperative Communications: To Expedite Simulation
Towards Real via Offline Reinforcement Learning [53.18060442931179]
We propose the age of semantics (AoS) for measuring semantics freshness of status updates in a cooperative relay communication system.
We derive an online deep actor-critic (DAC) learning scheme under the on-policy temporal difference learning framework.
We then put forward a novel offline DAC scheme, which estimates the optimal control policy from a previously collected dataset.
arXiv Detail & Related papers (2022-09-19T11:55:28Z) - A State Transition Model for Mobile Notifications via Survival Analysis [10.638942431625381]
We propose a state transition framework to quantitatively evaluate the effectiveness of notifications.
We develop a survival model for badging notifications assuming a log-linear structure and a Weibull distribution.
Our results show that this model achieves more flexibility for applications and superior prediction accuracy than a logistic regression model.
arXiv Detail & Related papers (2022-07-07T05:38:39Z) - Offline Reinforcement Learning for Mobile Notifications [1.965345368500676]
Mobile notification systems have taken a major role in driving and maintaining user engagement for online platforms.
Most machine learning applications in notification systems are built around response-prediction models.
We argue that reinforcement learning is a better framework for notification systems in terms of performance and iteration speed.
arXiv Detail & Related papers (2022-02-04T22:22:22Z) - Cellular traffic offloading via Opportunistic Networking with
Reinforcement Learning [0.5758073912084364]
We propose an adaptive offloading solution based on the Reinforcement Learning framework.
We evaluate and compare the performance of two well-known learning algorithms: Actor-Critic and Q-Learning.
Our solution achieves a higher level of offloading with respect to other state-of-the-art approaches.
arXiv Detail & Related papers (2021-10-01T13:34:12Z) - A Deep Value-network Based Approach for Multi-Driver Order Dispatching [55.36656442934531]
We propose a deep reinforcement learning based solution for order dispatching.
We conduct large scale online A/B tests on DiDi's ride-dispatching platform.
Results show that CVNet consistently outperforms other recently proposed dispatching methods.
arXiv Detail & Related papers (2021-06-08T16:27:04Z) - Learning to Recover Reasoning Chains for Multi-Hop Question Answering
via Cooperative Games [66.98855910291292]
We propose a new problem of learning to recover reasoning chains from weakly supervised signals.
How the evidence passages are selected and how the selected passages are connected are handled by two models.
For evaluation, we created benchmarks based on two multi-hop QA datasets.
arXiv Detail & Related papers (2020-04-06T03:54:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.