Adversarial Constrained Bidding via Minimax Regret Optimization with
Causality-Aware Reinforcement Learning
- URL: http://arxiv.org/abs/2306.07106v1
- Date: Mon, 12 Jun 2023 13:31:58 GMT
- Title: Adversarial Constrained Bidding via Minimax Regret Optimization with
Causality-Aware Reinforcement Learning
- Authors: Haozhe Wang, Chao Du, Panyan Fang, Li He, Liang Wang, Bo Zheng
- Abstract summary: Existing approaches on constrained bidding typically rely on i.i.d. train and test conditions.
We propose a practical Minimax Regret Optimization (MiRO) approach that interleaves between a teacher finding adversarial environments for tutoring and a learner meta-learning its policy over the given distribution of environments.
Our method, MiRO with Causality-aware reinforcement Learning (MiROCL), outperforms prior methods by over 30%.
- Score: 18.408964908248855
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The proliferation of the Internet has led to the emergence of online
advertising, driven by the mechanics of online auctions. In these repeated
auctions, software agents participate on behalf of aggregated advertisers to
optimize for their long-term utility. To fulfill the diverse demands, bidding
strategies are employed to optimize advertising objectives subject to different
spending constraints. Existing approaches on constrained bidding typically rely
on i.i.d. train and test conditions, which contradicts the adversarial nature
of online ad markets where different parties possess potentially conflicting
objectives. In this regard, we explore the problem of constrained bidding in
adversarial bidding environments, which assumes no knowledge about the
adversarial factors. Instead of relying on the i.i.d. assumption, our insight
is to align the train distribution of environments with the potential test
distribution meanwhile minimizing policy regret. Based on this insight, we
propose a practical Minimax Regret Optimization (MiRO) approach that
interleaves between a teacher finding adversarial environments for tutoring and
a learner meta-learning its policy over the given distribution of environments.
In addition, we pioneer to incorporate expert demonstrations for learning
bidding strategies. Through a causality-aware policy design, we improve upon
MiRO by distilling knowledge from the experts. Extensive experiments on both
industrial data and synthetic data show that our method, MiRO with
Causality-aware reinforcement Learning (MiROCL), outperforms prior methods by
over 30%.
Related papers
- Large Language Model driven Policy Exploration for Recommender Systems [50.70228564385797]
offline RL policies trained on static user data are vulnerable to distribution shift when deployed in dynamic online environments.
Online RL-based RS also face challenges in production deployment due to the risks of exposing users to untrained or unstable policies.
Large Language Models (LLMs) offer a promising solution to mimic user objectives and preferences for pre-training policies offline.
We propose an Interaction-Augmented Learned Policy (iALP) that utilizes user preferences distilled from an LLM.
arXiv Detail & Related papers (2025-01-23T16:37:44Z) - Hierarchical Multi-agent Meta-Reinforcement Learning for Cross-channel Bidding [4.741091524027138]
Real-time bidding (RTB) plays a pivotal role in online advertising ecosystems.
Traditional approaches cannot effectively manage the dynamic budget allocation problem.
We propose a hierarchical multi-agent reinforcement learning framework for multi-channel bidding optimization.
arXiv Detail & Related papers (2024-12-26T05:26:30Z) - Auto-bidding in real-time auctions via Oracle Imitation Learning (OIL) [9.19703820485146]
We propose a framework for training auto-bidding agents in multi-slot second-price auctions.
We exploit the insight that, after an advertisement campaign concludes, determining the optimal bids for each impression opportunity can be framed as a multiple-choice knapsack problem.
We propose an "oracle" algorithm that identifies a near-optimal combination of impression opportunities and advertisement slots.
arXiv Detail & Related papers (2024-12-16T04:21:35Z) - Robust Representation Learning for Unified Online Top-K Recommendation [39.12191494863331]
We propose a robust representation learning for the unified online top-k recommendation.
Our approach constructs unified modeling in entity space to ensure data fairness.
The proposed method has been successfully deployed online to serve real business scenarios.
arXiv Detail & Related papers (2023-10-24T03:42:20Z) - Online Ad Procurement in Non-stationary Autobidding Worlds [10.871587311621974]
We introduce a primal-dual algorithm for online decision making with multi-dimension decision variables, bandit feedback and long-term uncertain constraints.
We show that our algorithm achieves low regret in many worlds when procurement outcomes are generated through procedures that are adversarial, adversarially corrupted, periodic, and ergodic.
arXiv Detail & Related papers (2023-07-10T00:41:08Z) - Semantic Information Marketing in The Metaverse: A Learning-Based
Contract Theory Framework [68.8725783112254]
We address the problem of designing incentive mechanisms by a virtual service provider (VSP) to hire sensing IoT devices to sell their sensing data.
Due to the limited bandwidth, we propose to use semantic extraction algorithms to reduce the delivered data by the sensing IoT devices.
We propose a novel iterative contract design and use a new variant of multi-agent reinforcement learning (MARL) to solve the modelled multi-dimensional contract problem.
arXiv Detail & Related papers (2023-02-22T15:52:37Z) - Adaptive Risk-Aware Bidding with Budget Constraint in Display
Advertising [47.14651340748015]
We propose a novel adaptive risk-aware bidding algorithm with budget constraint via reinforcement learning.
We theoretically unveil the intrinsic relation between the uncertainty and the risk tendency based on value at risk (VaR)
arXiv Detail & Related papers (2022-12-06T18:50:09Z) - Data augmentation for efficient learning from parametric experts [88.33380893179697]
We focus on what we call the policy cloning setting, in which we use online or offline queries of an expert to inform the behavior of a student policy.
Our approach, augmented policy cloning (APC), uses synthetic states to induce feedback-sensitivity in a region around sampled trajectories.
We achieve highly data-efficient transfer of behavior from an expert to a student policy for high-degrees-of-freedom control problems.
arXiv Detail & Related papers (2022-05-23T16:37:16Z) - Off-policy Reinforcement Learning with Optimistic Exploration and
Distribution Correction [73.77593805292194]
We train a separate exploration policy to maximize an approximate upper confidence bound of the critics in an off-policy actor-critic framework.
To mitigate the off-policy-ness, we adapt the recently introduced DICE framework to learn a distribution correction ratio for off-policy actor-critic training.
arXiv Detail & Related papers (2021-10-22T22:07:51Z) - Techniques Toward Optimizing Viewability in RTB Ad Campaigns Using
Reinforcement Learning [0.0]
Reinforcement learning (RL) is an effective technique for training decision-making agents through interactions with their environment.
In digital advertising, real-time bidding (RTB) is a common method of allocating advertising inventory through real-time auctions.
arXiv Detail & Related papers (2021-05-21T21:56:12Z) - Learning to Infer User Hidden States for Online Sequential Advertising [52.169666997331724]
We propose our Deep Intents Sequential Advertising (DISA) method to address these issues.
The key part of interpretability is to understand a consumer's purchase intent which is, however, unobservable (called hidden states)
arXiv Detail & Related papers (2020-09-03T05:12:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.