Nash Equilibrium Constrained Auto-bidding With Bi-level Reinforcement Learning
- URL: http://arxiv.org/abs/2503.10304v1
- Date: Thu, 13 Mar 2025 12:25:36 GMT
- Title: Nash Equilibrium Constrained Auto-bidding With Bi-level Reinforcement Learning
- Authors: Zhiyu Mou, Miao Xu, Rongquan Bai, Zhuoran Yang, Chuan Yu, Jian Xu, Bo Zheng,
- Abstract summary: We propose a new formulation of the auto-bidding problem from the platform's perspective.<n>It aims to maximize the social welfare of all advertisers under the $epsilon$-NE constraint.<n>The NCB problem presents significant challenges due to its constrained bi-level structure and the typically large number of advertisers involved.
- Score: 64.2367385090879
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many online advertising platforms provide advertisers with auto-bidding services to enhance their advertising performance. However, most existing auto-bidding algorithms fail to accurately capture the auto-bidding problem formulation that the platform truly faces, let alone solve it. Actually, we argue that the platform should try to help optimize each advertiser's performance to the greatest extent -- which makes $\epsilon$-Nash Equilibrium ($\epsilon$-NE) a necessary solution concept -- while maximizing the social welfare of all the advertisers for the platform's long-term value. Based on this, we introduce the \emph{Nash-Equilibrium Constrained Bidding} (NCB), a new formulation of the auto-bidding problem from the platform's perspective. Specifically, it aims to maximize the social welfare of all advertisers under the $\epsilon$-NE constraint. However, the NCB problem presents significant challenges due to its constrained bi-level structure and the typically large number of advertisers involved. To address these challenges, we propose a \emph{Bi-level Policy Gradient} (BPG) framework with theoretical guarantees. Notably, its computational complexity is independent of the number of advertisers, and the associated gradients are straightforward to compute. Extensive simulated and real-world experiments validate the effectiveness of the BPG framework.
Related papers
- An Adaptable Budget Planner for Enhancing Budget-Constrained Auto-Bidding in Online Advertising [28.4314408199823]
ABPlanner is a few-shot adaptable budget planner designed to improve budget-constrained auto-bidding.<n>ABPlanner allocates the budget across all stages, allowing a low-level auto-bidder to bids based on the budget allocation plan.<n>The adaptability of ABPlanner is achieved through a sequential decision-making approach, inspired by in-context reinforcement learning.
arXiv Detail & Related papers (2025-01-26T08:00:23Z) - A Primal-Dual Online Learning Approach for Dynamic Pricing of Sequentially Displayed Complementary Items under Sale Constraints [54.46126953873298]
We address the problem of dynamically pricing complementary items that are sequentially displayed to customers.
Coherent pricing policies for complementary items are essential because optimizing the pricing of each item individually is ineffective.
We empirically evaluate our approach using synthetic settings randomly generated from real-world data, and compare its performance in terms of constraints violation and regret.
arXiv Detail & Related papers (2024-07-08T09:55:31Z) - Online Ad Procurement in Non-stationary Autobidding Worlds [10.871587311621974]
We introduce a primal-dual algorithm for online decision making with multi-dimension decision variables, bandit feedback and long-term uncertain constraints.
We show that our algorithm achieves low regret in many worlds when procurement outcomes are generated through procedures that are adversarial, adversarially corrupted, periodic, and ergodic.
arXiv Detail & Related papers (2023-07-10T00:41:08Z) - Online Learning under Budget and ROI Constraints via Weak Adaptivity [57.097119428915796]
Existing primal-dual algorithms for constrained online learning problems rely on two fundamental assumptions.
We show how such assumptions can be circumvented by endowing standard primal-dual templates with weakly adaptive regret minimizers.
We prove the first best-of-both-worlds no-regret guarantees which hold in absence of the two aforementioned assumptions.
arXiv Detail & Related papers (2023-02-02T16:30:33Z) - Autobidders with Budget and ROI Constraints: Efficiency, Regret, and Pacing Dynamics [53.62091043347035]
We study a game between autobidding algorithms that compete in an online advertising platform.<n>We propose a gradient-based learning algorithm that is guaranteed to satisfy all constraints and achieves vanishing individual regret.
arXiv Detail & Related papers (2023-01-30T21:59:30Z) - LBCF: A Large-Scale Budget-Constrained Causal Forest Algorithm [11.82503645248441]
How to select the right amount of incentives (i.e. treatment) to each user under budget constraints is an important research problem.
We propose a novel tree-based treatment selection technique under budget constraints, called Large-Scale Budget-Constrained Causal Forest (LBCF) algorithm.
We deploy our approach in a real-world scenario on a large-scale video platform, where the platform gives away bonuses in order to increase users' campaign engagement duration.
arXiv Detail & Related papers (2022-01-29T13:21:07Z) - Inducing Equilibria via Incentives: Simultaneous Design-and-Play Finds
Global Optima [114.31577038081026]
We propose an efficient method that tackles the designer's and agents' problems simultaneously in a single loop.
Although the designer does not solve the equilibrium problem repeatedly, it can anticipate the overall influence of the incentives on the agents.
We prove that the algorithm converges to the global optima at a sublinear rate for a broad class of games.
arXiv Detail & Related papers (2021-10-04T06:53:59Z) - Efficient Algorithms for Global Inference in Internet Marketplaces [6.2122699483618]
Matching demand to supply in internet marketplaces is a global inference problem.
Until recently, solving such problems on web-scale data with an LP formulation was intractable.
Recent work developed a dual decomposition-based approach to solve such problems when the polytope constraints are simple.
In this work, we motivate the need to go beyond these simple polytopes and show real-world internet marketplaces that require more complex structured polytope constraints.
arXiv Detail & Related papers (2021-03-09T08:00:58Z) - Dynamic Knapsack Optimization Towards Efficient Multi-Channel Sequential
Advertising [52.3825928886714]
We formulate the sequential advertising strategy optimization as a dynamic knapsack problem.
We propose a theoretically guaranteed bilevel optimization framework, which significantly reduces the solution space of the original optimization space.
To improve the exploration efficiency of reinforcement learning, we also devise an effective action space reduction approach.
arXiv Detail & Related papers (2020-06-29T18:50:35Z) - Public Bayesian Persuasion: Being Almost Optimal and Almost Persuasive [57.47546090379434]
We study the public persuasion problem in the general setting with: (i) arbitrary state spaces; (ii) arbitrary action spaces; (iii) arbitrary sender's utility functions.
We provide a quasi-polynomial time bi-criteria approximation algorithm for arbitrary public persuasion problems that, in specific settings, yields a QPTAS.
arXiv Detail & Related papers (2020-02-12T18:59:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.