Functional Optimization Reinforcement Learning for Real-Time Bidding
- URL: http://arxiv.org/abs/2206.13939v1
- Date: Sat, 25 Jun 2022 06:12:17 GMT
- Title: Functional Optimization Reinforcement Learning for Real-Time Bidding
- Authors: Yining Lu, Changjie Lu, Naina Bandyopadhyay, Manoj Kumar, Gaurav Gupta
- Abstract summary: Real-time bidding is the new paradigm of programmatic advertising.
Existing approaches are struggling to provide a satisfactory solution for bidding optimization.
This paper proposes a multi-agent reinforcement learning architecture for RTB with functional optimization.
- Score: 14.5826735379053
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Real-time bidding is the new paradigm of programmatic advertising. An
advertiser wants to make the intelligent choice of utilizing a
\textbf{Demand-Side Platform} to improve the performance of their ad campaigns.
Existing approaches are struggling to provide a satisfactory solution for
bidding optimization due to stochastic bidding behavior. In this paper, we
proposed a multi-agent reinforcement learning architecture for RTB with
functional optimization. We designed four agents bidding environment: three
Lagrange-multiplier based functional optimization agents and one baseline agent
(without any attribute of functional optimization) First, numerous attributes
have been assigned to each agent, including biased or unbiased win probability,
Lagrange multiplier, and click-through rate. In order to evaluate the proposed
RTB strategy's performance, we demonstrate the results on ten sequential
simulated auction campaigns. The results show that agents with functional
actions and rewards had the most significant average winning rate and winning
surplus, given biased and unbiased winning information respectively. The
experimental evaluations show that our approach significantly improve the
campaign's efficacy and profitability.
Related papers
- Fair Allocation in Dynamic Mechanism Design [57.66441610380448]
We consider a problem where an auctioneer sells an indivisible good to groups of buyers in every round, for a total of $T$ rounds.
The auctioneer aims to maximize their discounted overall revenue while adhering to a fairness constraint that guarantees a minimum average allocation for each group.
arXiv Detail & Related papers (2024-05-31T19:26:05Z) - Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning [55.96599486604344]
We introduce an approach aimed at enhancing the reasoning capabilities of Large Language Models (LLMs) through an iterative preference learning process.
We use Monte Carlo Tree Search (MCTS) to iteratively collect preference data, utilizing its look-ahead ability to break down instance-level rewards into more granular step-level signals.
The proposed algorithm employs Direct Preference Optimization (DPO) to update the LLM policy using this newly generated step-level preference data.
arXiv Detail & Related papers (2024-05-01T11:10:24Z) - Enhanced Bayesian Optimization via Preferential Modeling of Abstract
Properties [49.351577714596544]
We propose a human-AI collaborative Bayesian framework to incorporate expert preferences about unmeasured abstract properties into surrogate modeling.
We provide an efficient strategy that can also handle any incorrect/misleading expert bias in preferential judgments.
arXiv Detail & Related papers (2024-02-27T09:23:13Z) - Maximizing the Success Probability of Policy Allocations in Online
Systems [5.485872703839928]
In this paper we consider the problem at the level of user timelines instead of individual bid requests.
In order to optimally allocate policies to users, typical multiple treatments allocation methods solve knapsack-like problems.
We introduce the SuccessProMax algorithm that aims at finding the policy allocation which is the most likely to outperform a fixed policy.
arXiv Detail & Related papers (2023-12-26T10:55:33Z) - DeepHive: A multi-agent reinforcement learning approach for automated
discovery of swarm-based optimization policies [0.0]
The state of each agent within the swarm is defined as its current position and function value within a design space.
The proposed approach is tested on various benchmark optimization functions and compared to the performance of other global optimization strategies.
arXiv Detail & Related papers (2023-03-29T18:08:08Z) - Non-Myopic Multifidelity Bayesian Optimization [0.0]
This paper proposes a non-myopic multifidelity Bayesian framework to grasp the long-term reward from future steps of the optimization.
We demonstrate that the proposed algorithm outperforms a standard multifidelity Bayesian framework on popular benchmark optimization problems.
arXiv Detail & Related papers (2022-07-13T16:25:35Z) - A Unified Framework for Campaign Performance Forecasting in Online
Display Advertising [9.005665883444902]
Interpretable and accurate results could enable advertisers to manage and optimize their campaign criteria.
New framework reproduces campaign performance on historical logs under various bidding types with a unified replay algorithm.
Method captures mixture calibration patterns among related forecast indicators to map the estimated results to the true ones.
arXiv Detail & Related papers (2022-02-24T03:04:29Z) - Bid Optimization using Maximum Entropy Reinforcement Learning [0.3149883354098941]
This paper focuses on optimizing a single advertiser's bidding strategy using reinforcement learning (RL) in real-time bidding (RTB)
We first utilize a widely accepted linear bidding function to compute every impression's base price and optimize it by a mutable adjustment factor derived from the RTB auction environment.
Finally, the empirical study on a public dataset demonstrates that the proposed bidding strategy has superior performance compared with the baselines.
arXiv Detail & Related papers (2021-10-11T06:53:53Z) - A Cooperative-Competitive Multi-Agent Framework for Auto-bidding in
Online Advertising [53.636153252400945]
We propose a general Multi-Agent reinforcement learning framework for Auto-Bidding, namely MAAB, to learn the auto-bidding strategies.
Our approach outperforms several baseline methods in terms of social welfare and guarantees the ad platform's revenue.
arXiv Detail & Related papers (2021-06-11T08:07:14Z) - Are we Forgetting about Compositional Optimisers in Bayesian
Optimisation? [66.39551991177542]
This paper presents a sample methodology for global optimisation.
Within this, a crucial performance-determiningtrivial is maximising the acquisition function.
We highlight the empirical advantages of the approach to optimise functionation across 3958 individual experiments.
arXiv Detail & Related papers (2020-12-15T12:18:38Z) - Dynamic Knapsack Optimization Towards Efficient Multi-Channel Sequential
Advertising [52.3825928886714]
We formulate the sequential advertising strategy optimization as a dynamic knapsack problem.
We propose a theoretically guaranteed bilevel optimization framework, which significantly reduces the solution space of the original optimization space.
To improve the exploration efficiency of reinforcement learning, we also devise an effective action space reduction approach.
arXiv Detail & Related papers (2020-06-29T18:50:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.