Techniques Toward Optimizing Viewability in RTB Ad Campaigns Using
Reinforcement Learning
- URL: http://arxiv.org/abs/2105.10587v1
- Date: Fri, 21 May 2021 21:56:12 GMT
- Title: Techniques Toward Optimizing Viewability in RTB Ad Campaigns Using
Reinforcement Learning
- Authors: Michael Tashman, John Hoffman, Jiayi Xie, Fengdan Ye, Atefeh Morsali,
Lee Winikor, Rouzbeh Gerami
- Abstract summary: Reinforcement learning (RL) is an effective technique for training decision-making agents through interactions with their environment.
In digital advertising, real-time bidding (RTB) is a common method of allocating advertising inventory through real-time auctions.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement learning (RL) is an effective technique for training
decision-making agents through interactions with their environment. The advent
of deep learning has been associated with highly notable successes with
sequential decision making problems - such as defeating some of the
highest-ranked human players at Go. In digital advertising, real-time bidding
(RTB) is a common method of allocating advertising inventory through real-time
auctions. Bidding strategies need to incorporate logic for dynamically
adjusting parameters in order to deliver pre-assigned campaign goals. Here we
discuss techniques toward using RL to train bidding agents. As a campaign
metric we particularly focused on viewability: the percentage of inventory
which goes on to be viewed by an end user.
This paper is presented as a survey of techniques and experiments which we
developed through the course of this research. We discuss expanding our
training data to include edge cases by training on simulated interactions. We
discuss the experimental results comparing the performance of several promising
RL algorithms, and an approach to hyperparameter optimization of an
actor/critic training pipeline through Bayesian optimization. Finally, we
present live-traffic tests of some of our RL agents against a rule-based
feedback-control approach, demonstrating the potential for this method as well
as areas for further improvement. This paper therefore presents an arrangement
of our findings in this quickly developing field, and ways that it can be
applied to an RTB use case.
Related papers
- An Extremely Data-efficient and Generative LLM-based Reinforcement Learning Agent for Recommenders [1.0154385852423122]
reinforcement learning (RL) algorithms have been instrumental in maximizing long-term customer satisfaction and avoiding short-term, myopic goals in industrial recommender systems.
The goal is to train an RL agent to maximize the purchase reward given a detailed human instruction describing a desired product.
This report also evaluates the RL agents trained using generative trajectories.
arXiv Detail & Related papers (2024-08-28T10:31:50Z) - Trial and Error: Exploration-Based Trajectory Optimization for LLM Agents [49.85633804913796]
We present an exploration-based trajectory optimization approach, referred to as ETO.
This learning method is designed to enhance the performance of open LLM agents.
Our experiments on three complex tasks demonstrate that ETO consistently surpasses baseline performance by a large margin.
arXiv Detail & Related papers (2024-03-04T21:50:29Z) - Trajectory-wise Iterative Reinforcement Learning Framework for Auto-bidding [16.556934508295456]
In online advertising, advertisers participate in ad auctions to acquire ad opportunities, often by utilizing auto-bidding tools provided by demand-side platforms (DSPs)
Due to safety concerns, most RL-based auto-bidding policies are trained in simulation, leading to a performance degradation when deployed in online environments.
We propose Trajectory-wise Exploration and Exploitation (TEE), which introduces a novel data collecting and data utilization method for iterative offline RL.
arXiv Detail & Related papers (2024-02-23T05:20:23Z) - RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning.
Our proposed method uses reinforcement learning with user intervention signals themselves as rewards.
This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z) - Learning and reusing primitive behaviours to improve Hindsight
Experience Replay sample efficiency [7.806014635635933]
We propose a method that uses primitive behaviours that have been previously learned to solve simple tasks.
This guidance is not executed by a manually designed curriculum, but rather using a critic network to decide at each timestep whether or not to use the actions proposed.
We demonstrate the agents can learn a successful policy faster when using our proposed method, both in terms of sample efficiency and computation time.
arXiv Detail & Related papers (2023-10-03T06:49:57Z) - Query-Dependent Prompt Evaluation and Optimization with Offline Inverse
RL [62.824464372594576]
We aim to enhance arithmetic reasoning ability of Large Language Models (LLMs) through zero-shot prompt optimization.
We identify a previously overlooked objective of query dependency in such optimization.
We introduce Prompt-OIRL, which harnesses offline inverse reinforcement learning to draw insights from offline prompting demonstration data.
arXiv Detail & Related papers (2023-09-13T01:12:52Z) - Adversarial Constrained Bidding via Minimax Regret Optimization with
Causality-Aware Reinforcement Learning [18.408964908248855]
Existing approaches on constrained bidding typically rely on i.i.d. train and test conditions.
We propose a practical Minimax Regret Optimization (MiRO) approach that interleaves between a teacher finding adversarial environments for tutoring and a learner meta-learning its policy over the given distribution of environments.
Our method, MiRO with Causality-aware reinforcement Learning (MiROCL), outperforms prior methods by over 30%.
arXiv Detail & Related papers (2023-06-12T13:31:58Z) - Prompt-Tuning Decision Transformer with Preference Ranking [83.76329715043205]
We propose the Prompt-Tuning DT algorithm to address challenges by using trajectory segments as prompts to guide RL agents in acquiring environmental information.
Our approach involves randomly sampling a Gaussian distribution to fine-tune the elements of the prompt trajectory and using preference ranking function to find the optimization direction.
Our work contributes to the advancement of prompt-tuning approaches in RL, providing a promising direction for optimizing large RL agents for specific preference tasks.
arXiv Detail & Related papers (2023-05-16T17:49:04Z) - Improving Real-Time Bidding in Online Advertising Using Markov Decision
Processes and Machine Learning Techniques [0.0]
This paper proposes a novel method for real-time bidding that combines deep learning and reinforcement learning techniques.
The proposed method employs a deep neural network to predict auction details and market prices and a reinforcement learning algorithm to determine the optimal bid price.
The outcomes demonstrate that the proposed method is preferable regarding cost-effectiveness and precision.
arXiv Detail & Related papers (2023-05-05T14:34:20Z) - Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels [112.63440666617494]
Reinforcement learning algorithms can succeed but require large amounts of interactions between the agent and the environment.
We propose a new method to solve it, using unsupervised model-based RL, for pre-training the agent.
We show robust performance on the Real-Word RL benchmark, hinting at resiliency to environment perturbations during adaptation.
arXiv Detail & Related papers (2022-09-24T14:22:29Z) - PsiPhi-Learning: Reinforcement Learning with Demonstrations using
Successor Features and Inverse Temporal Difference Learning [102.36450942613091]
We propose an inverse reinforcement learning algorithm, called emphinverse temporal difference learning (ITD)
We show how to seamlessly integrate ITD with learning from online environment interactions, arriving at a novel algorithm for reinforcement learning with demonstrations, called $Psi Phi$-learning.
arXiv Detail & Related papers (2021-02-24T21:12:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.