Offline Reinforcement Learning for Optimizing Production Bidding
Policies
- URL: http://arxiv.org/abs/2310.09426v1
- Date: Fri, 13 Oct 2023 22:14:51 GMT
- Title: Offline Reinforcement Learning for Optimizing Production Bidding
Policies
- Authors: Dmytro Korenkevych, Frank Cheng, Artsiom Balakir, Alex Nikulkov,
Lingnan Gao, Zhihao Cen, Zuobing Xu, Zheqing Zhu
- Abstract summary: We propose a generalizable approach to optimizing bidding policies in production environments.
We use a hybrid agent architecture that combines arbitrary base policies with deep neural networks.
We demonstrate that such an architecture achieves statistically significant performance gains in both simulated and at-scale production bidding environments.
- Score: 1.8689461238197953
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The online advertising market, with its thousands of auctions run per second,
presents a daunting challenge for advertisers who wish to optimize their spend
under a budget constraint. Thus, advertising platforms typically provide
automated agents to their customers, which act on their behalf to bid for
impression opportunities in real time at scale. Because these proxy agents are
owned by the platform but use advertiser funds to operate, there is a strong
practical need to balance reliability and explainability of the agent with
optimizing power. We propose a generalizable approach to optimizing bidding
policies in production environments by learning from real data using offline
reinforcement learning. This approach can be used to optimize any
differentiable base policy (practically, a heuristic policy based on principles
which the advertiser can easily understand), and only requires data generated
by the base policy itself. We use a hybrid agent architecture that combines
arbitrary base policies with deep neural networks, where only the optimized
base policy parameters are eventually deployed, and the neural network part is
discarded after training. We demonstrate that such an architecture achieves
statistically significant performance gains in both simulated and at-scale
production bidding environments. Our approach does not incur additional
infrastructure, safety, or explainability costs, as it directly optimizes
parameters of existing production routines without replacing them with black
box-style models like neural networks.
Related papers
- Large Language Model driven Policy Exploration for Recommender Systems [50.70228564385797]
offline RL policies trained on static user data are vulnerable to distribution shift when deployed in dynamic online environments.
Online RL-based RS also face challenges in production deployment due to the risks of exposing users to untrained or unstable policies.
Large Language Models (LLMs) offer a promising solution to mimic user objectives and preferences for pre-training policies offline.
We propose an Interaction-Augmented Learned Policy (iALP) that utilizes user preferences distilled from an LLM.
arXiv Detail & Related papers (2025-01-23T16:37:44Z) - Hierarchical Multi-agent Meta-Reinforcement Learning for Cross-channel Bidding [4.741091524027138]
Real-time bidding (RTB) plays a pivotal role in online advertising ecosystems.
Traditional approaches cannot effectively manage the dynamic budget allocation problem.
We propose a hierarchical multi-agent reinforcement learning framework for multi-channel bidding optimization.
arXiv Detail & Related papers (2024-12-26T05:26:30Z) - GAS: Generative Auto-bidding with Post-training Search [26.229396732360787]
We propose a flexible and practical Generative Auto-bidding scheme using post-training Search, termed GAS, to refine a base policy model's output.
Experiments conducted on the real-world dataset and online A/B test on the Kuaishou advertising platform demonstrate the effectiveness of GAS.
arXiv Detail & Related papers (2024-12-22T13:47:46Z) - Bayesian Design Principles for Offline-to-Online Reinforcement Learning [50.97583504192167]
offline-to-online fine-tuning is crucial for real-world applications where exploration can be costly or unsafe.
In this paper, we tackle the dilemma of offline-to-online fine-tuning: if the agent remains pessimistic, it may fail to learn a better policy, while if it becomes optimistic directly, performance may suffer from a sudden drop.
We show that Bayesian design principles are crucial in solving such a dilemma.
arXiv Detail & Related papers (2024-05-31T16:31:07Z) - Maximizing the Success Probability of Policy Allocations in Online
Systems [5.485872703839928]
In this paper we consider the problem at the level of user timelines instead of individual bid requests.
In order to optimally allocate policies to users, typical multiple treatments allocation methods solve knapsack-like problems.
We introduce the SuccessProMax algorithm that aims at finding the policy allocation which is the most likely to outperform a fixed policy.
arXiv Detail & Related papers (2023-12-26T10:55:33Z) - Insurance pricing on price comparison websites via reinforcement
learning [7.023335262537794]
This paper introduces reinforcement learning framework that learns optimal pricing policy by integrating model-based and model-free methods.
The paper also highlights the importance of evaluating pricing policies using an offline dataset in a consistent fashion.
arXiv Detail & Related papers (2023-08-14T04:44:56Z) - Supported Policy Optimization for Offline Reinforcement Learning [74.1011309005488]
Policy constraint methods to offline reinforcement learning (RL) typically utilize parameterization or regularization.
Regularization methods reduce the divergence between the learned policy and the behavior policy.
This paper presents Supported Policy OpTimization (SPOT), which is directly derived from the theoretical formalization of the density-based support constraint.
arXiv Detail & Related papers (2022-02-13T07:38:36Z) - Building a Foundation for Data-Driven, Interpretable, and Robust Policy
Design using the AI Economist [67.08543240320756]
We show that the AI Economist framework enables effective, flexible, and interpretable policy design using two-level reinforcement learning and data-driven simulations.
We find that log-linear policies trained using RL significantly improve social welfare, based on both public health and economic outcomes, compared to past outcomes.
arXiv Detail & Related papers (2021-08-06T01:30:41Z) - OptiDICE: Offline Policy Optimization via Stationary Distribution
Correction Estimation [59.469401906712555]
We present an offline reinforcement learning algorithm that prevents overestimation in a more principled way.
Our algorithm, OptiDICE, directly estimates the stationary distribution corrections of the optimal policy.
We show that OptiDICE performs competitively with the state-of-the-art methods.
arXiv Detail & Related papers (2021-06-21T00:43:30Z) - Deployment-Efficient Reinforcement Learning via Model-Based Offline
Optimization [46.017212565714175]
We propose a novel concept of deployment efficiency, measuring the number of distinct data-collection policies that are used during policy learning.
We propose a novel model-based algorithm, Behavior-Regularized Model-ENsemble (BREMEN) that can effectively optimize a policy offline using 10-20 times fewer data than prior works.
arXiv Detail & Related papers (2020-06-05T19:33:19Z) - Guided Uncertainty-Aware Policy Optimization: Combining Learning and
Model-Based Strategies for Sample-Efficient Policy Learning [75.56839075060819]
Traditional robotic approaches rely on an accurate model of the environment, a detailed description of how to perform the task, and a robust perception system to keep track of the current state.
reinforcement learning approaches can operate directly from raw sensory inputs with only a reward signal to describe the task, but are extremely sample-inefficient and brittle.
In this work, we combine the strengths of model-based methods with the flexibility of learning-based methods to obtain a general method that is able to overcome inaccuracies in the robotics perception/actuation pipeline.
arXiv Detail & Related papers (2020-05-21T19:47:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.