Exploration in Online Advertising Systems with Deep Uncertainty-Aware
Learning
- URL: http://arxiv.org/abs/2012.02298v2
- Date: Tue, 15 Jun 2021 06:28:13 GMT
- Title: Exploration in Online Advertising Systems with Deep Uncertainty-Aware
Learning
- Authors: Chao Du, Zhifeng Gao, Shuo Yuan, Lining Gao, Ziyan Li, Yifan Zeng,
Xiaoqiang Zhu, Jian Xu, Kun Gai, Kuang-chih Lee
- Abstract summary: We propose a novel Deep Uncertainty-Aware Learning (DUAL) method to learn click-through rate (CTR) prediction models.
DUAL can be easily implemented on existing models and deployed in real-time systems with minimal extra computational overhead.
We also present strategies to boost up long-term utilities such as the social welfare in advertising systems.
- Score: 26.24464382500032
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modern online advertising systems inevitably rely on personalization methods,
such as click-through rate (CTR) prediction. Recent progress in CTR prediction
enjoys the rich representation capabilities of deep learning and achieves great
success in large-scale industrial applications. However, these methods can
suffer from lack of exploration. Another line of prior work addresses the
exploration-exploitation trade-off problem with contextual bandit methods,
which are recently less studied in the industry due to the difficulty in
extending their flexibility with deep models. In this paper, we propose a novel
Deep Uncertainty-Aware Learning (DUAL) method to learn CTR models based on
Gaussian processes, which can provide predictive uncertainty estimations while
maintaining the flexibility of deep neural networks. DUAL can be easily
implemented on existing models and deployed in real-time systems with minimal
extra computational overhead. By linking the predictive uncertainty estimation
ability of DUAL to well-known bandit algorithms, we further present DUAL-based
Ad-ranking strategies to boost up long-term utilities such as the social
welfare in advertising systems. Experimental results on several public datasets
demonstrate the effectiveness of our methods. Remarkably, an online A/B test
deployed in the Alibaba display advertising platform shows an 8.2% social
welfare improvement and an 8.0% revenue lift.
Related papers
- Reinforcement Learning with Action Chunking [56.838297900091426]
We present Q-chunking, a recipe for improving reinforcement learning algorithms for long-horizon, sparse-reward tasks.<n>Our recipe is designed for the offline-to-online RL setting, where the goal is to leverage an offline prior dataset to maximize the sample-efficiency of online learning.<n>Our experimental results demonstrate that Q-chunking exhibits strong offline performance and online sample efficiency, outperforming prior best offline-to-online methods on a range of long-horizon, sparse-reward manipulation tasks.
arXiv Detail & Related papers (2025-07-10T17:48:03Z) - Accelerated Online Reinforcement Learning using Auxiliary Start State Distributions [50.44719434877687]
Expert demonstrations and simulators can reset to arbitrary states.<n>We find that using a notion of safety to inform the choice of this auxiliary distribution significantly accelerates learning.
arXiv Detail & Related papers (2025-07-07T01:54:05Z) - Detecting Toxic Flow [0.40964539027092917]
This paper develops a framework to predict toxic trades that a broker receives from her clients.
We use a proprietary dataset of foreign exchange transactions to test our methodology.
We devise a strategy for the broker who uses toxicity predictions to internalise or to externalise each trade received from her clients.
arXiv Detail & Related papers (2023-12-10T09:00:09Z) - Predicted Embedding Power Regression for Large-Scale Out-of-Distribution
Detection [77.1596426383046]
We develop a novel approach that calculates the probability of the predicted class label based on label distributions learned during the training process.
Our method performs better than current state-of-the-art methods with only a negligible increase in compute cost.
arXiv Detail & Related papers (2023-03-07T18:28:39Z) - Reinforcement Learning in the Wild: Scalable RL Dispatching Algorithm
Deployed in Ridehailing Marketplace [12.298997392937876]
This study proposes a real-time dispatching algorithm based on reinforcement learning.
It is deployed online in multiple cities under DiDi's operation for A/B testing and is launched in one of the major international markets.
The deployed algorithm shows over 1.3% improvement in total driver income from A/B testing.
arXiv Detail & Related papers (2022-02-10T16:07:17Z) - Adversarial Gradient Driven Exploration for Deep Click-Through Rate
Prediction [39.61776002290324]
We propose a novel exploration method called textbfAdrial textbfGradientversa Driven textbfExploration (AGE)
AGE simulates the gradient updating process, which can approximate the influence of the samples of to-be-explored items for the model.
The effectiveness of our approach was demonstrated on an open-access academic dataset.
arXiv Detail & Related papers (2021-12-21T12:13:07Z) - MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven
Reinforcement Learning [65.52675802289775]
We show that an uncertainty aware classifier can solve challenging reinforcement learning problems.
We propose a novel method for computing the normalized maximum likelihood (NML) distribution.
We show that the resulting algorithm has a number of intriguing connections to both count-based exploration methods and prior algorithms for learning reward functions.
arXiv Detail & Related papers (2021-07-15T08:19:57Z) - Towards Reducing Labeling Cost in Deep Object Detection [61.010693873330446]
We propose a unified framework for active learning, that considers both the uncertainty and the robustness of the detector.
Our method is able to pseudo-label the very confident predictions, suppressing a potential distribution drift.
arXiv Detail & Related papers (2021-06-22T16:53:09Z) - Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning [63.53407136812255]
Offline Reinforcement Learning promises to learn effective policies from previously-collected, static datasets without the need for exploration.
Existing Q-learning and actor-critic based off-policy RL algorithms fail when bootstrapping from out-of-distribution (OOD) actions or states.
We propose Uncertainty Weighted Actor-Critic (UWAC), an algorithm that detects OOD state-action pairs and down-weights their contribution in the training objectives accordingly.
arXiv Detail & Related papers (2021-05-17T20:16:46Z) - Deep Bayesian Bandits: Exploring in Online Personalized Recommendations [4.845576821204241]
We formulate a display advertising recommender as a contextual bandit.
We implement exploration techniques that require sampling from the posterior distribution of click-through-rates.
We test our proposed deep Bayesian bandits algorithm in the offline simulation and online AB setting.
arXiv Detail & Related papers (2020-08-03T08:58:18Z) - Efficient Model-Based Reinforcement Learning through Optimistic Policy
Search and Planning [93.1435980666675]
We show how optimistic exploration can be easily combined with state-of-the-art reinforcement learning algorithms.
Our experiments demonstrate that optimistic exploration significantly speeds-up learning when there are penalties on actions.
arXiv Detail & Related papers (2020-06-15T18:37:38Z) - Guided Uncertainty-Aware Policy Optimization: Combining Learning and
Model-Based Strategies for Sample-Efficient Policy Learning [75.56839075060819]
Traditional robotic approaches rely on an accurate model of the environment, a detailed description of how to perform the task, and a robust perception system to keep track of the current state.
reinforcement learning approaches can operate directly from raw sensory inputs with only a reward signal to describe the task, but are extremely sample-inefficient and brittle.
In this work, we combine the strengths of model-based methods with the flexibility of learning-based methods to obtain a general method that is able to overcome inaccuracies in the robotics perception/actuation pipeline.
arXiv Detail & Related papers (2020-05-21T19:47:05Z) - Debiased Off-Policy Evaluation for Recommendation Systems [8.63711086812655]
A/B tests are reliable, but are time- and money-consuming, and entail a risk of failure.
We develop an alternative method, which predicts the performance of algorithms given historical data.
Our method produces smaller mean squared errors than state-of-the-art methods.
arXiv Detail & Related papers (2020-02-20T02:30:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.