Related papers: Adversarial Constrained Bidding via Minimax Regret Optimization with Causality-Aware Reinforcement Learning

Adversarial Constrained Bidding via Minimax Regret Optimization with Causality-Aware Reinforcement Learning

URL: http://arxiv.org/abs/2306.07106v1
Date: Mon, 12 Jun 2023 13:31:58 GMT
Title: Adversarial Constrained Bidding via Minimax Regret Optimization with Causality-Aware Reinforcement Learning
Authors: Haozhe Wang, Chao Du, Panyan Fang, Li He, Liang Wang, Bo Zheng
Abstract summary: Existing approaches on constrained bidding typically rely on i.i.d. train and test conditions. We propose a practical Minimax Regret Optimization (MiRO) approach that interleaves between a teacher finding adversarial environments for tutoring and a learner meta-learning its policy over the given distribution of environments. Our method, MiRO with Causality-aware reinforcement Learning (MiROCL), outperforms prior methods by over 30%.
Score: 18.408964908248855
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The proliferation of the Internet has led to the emergence of online advertising, driven by the mechanics of online auctions. In these repeated auctions, software agents participate on behalf of aggregated advertisers to optimize for their long-term utility. To fulfill the diverse demands, bidding strategies are employed to optimize advertising objectives subject to different spending constraints. Existing approaches on constrained bidding typically rely on i.i.d. train and test conditions, which contradicts the adversarial nature of online ad markets where different parties possess potentially conflicting objectives. In this regard, we explore the problem of constrained bidding in adversarial bidding environments, which assumes no knowledge about the adversarial factors. Instead of relying on the i.i.d. assumption, our insight is to align the train distribution of environments with the potential test distribution meanwhile minimizing policy regret. Based on this insight, we propose a practical Minimax Regret Optimization (MiRO) approach that interleaves between a teacher finding adversarial environments for tutoring and a learner meta-learning its policy over the given distribution of environments. In addition, we pioneer to incorporate expert demonstrations for learning bidding strategies. Through a causality-aware policy design, we improve upon MiRO by distilling knowledge from the experts. Extensive experiments on both industrial data and synthetic data show that our method, MiRO with Causality-aware reinforcement Learning (MiROCL), outperforms prior methods by over 30%.

Related papers

Multi-task Offline Reinforcement Learning for Online Advertising in Recommender Systems [54.709976343045824]
Current offline reinforcement learning (RL) methods face substantial challenges when applied to sparse advertising scenarios.<n>We propose MTORL, a novel multi-task offline RL model that targets two key objectives.<n>We employ multi-task learning to decode actions and rewards, simultaneously addressing channel recommendation and budget allocation.
arXiv Detail & Related papers (2025-06-29T05:05:13Z)
Large Language Model driven Policy Exploration for Recommender Systems [50.70228564385797]
offline RL policies trained on static user data are vulnerable to distribution shift when deployed in dynamic online environments. Online RL-based RS also face challenges in production deployment due to the risks of exposing users to untrained or unstable policies. Large Language Models (LLMs) offer a promising solution to mimic user objectives and preferences for pre-training policies offline. We propose an Interaction-Augmented Learned Policy (iALP) that utilizes user preferences distilled from an LLM.
arXiv Detail & Related papers (2025-01-23T16:37:44Z)
Hierarchical Multi-agent Meta-Reinforcement Learning for Cross-channel Bidding [4.741091524027138]
Real-time bidding (RTB) plays a pivotal role in online advertising ecosystems. Traditional approaches cannot effectively manage the dynamic budget allocation problem. We propose a hierarchical multi-agent reinforcement learning framework for multi-channel bidding optimization.
arXiv Detail & Related papers (2024-12-26T05:26:30Z)
Auto-bidding in real-time auctions via Oracle Imitation Learning (OIL) [9.19703820485146]
We propose a framework for training auto-bidding agents in multi-slot second-price auctions. We exploit the insight that, after an advertisement campaign concludes, determining the optimal bids for each impression opportunity can be framed as a multiple-choice knapsack problem. We propose an "oracle" algorithm that identifies a near-optimal combination of impression opportunities and advertisement slots.
arXiv Detail & Related papers (2024-12-16T04:21:35Z)
Maximizing the Success Probability of Policy Allocations in Online Systems [5.485872703839928]
In this paper we consider the problem at the level of user timelines instead of individual bid requests. In order to optimally allocate policies to users, typical multiple treatments allocation methods solve knapsack-like problems. We introduce the SuccessProMax algorithm that aims at finding the policy allocation which is the most likely to outperform a fixed policy.
arXiv Detail & Related papers (2023-12-26T10:55:33Z)
Exploring Federated Unlearning: Review, Comparison, and Insights [101.64910079905566]
federated unlearning enables the selective removal of data from models trained in federated systems.<n>This paper examines existing federated unlearning approaches, examining their algorithmic efficiency, impact on model accuracy, and effectiveness in preserving privacy.<n>We propose the OpenFederatedUnlearning framework, a unified benchmark for evaluating federated unlearning methods.
arXiv Detail & Related papers (2023-10-30T01:34:33Z)
Robust Representation Learning for Unified Online Top-K Recommendation [39.12191494863331]
We propose a robust representation learning for the unified online top-k recommendation. Our approach constructs unified modeling in entity space to ensure data fairness. The proposed method has been successfully deployed online to serve real business scenarios.
arXiv Detail & Related papers (2023-10-24T03:42:20Z)
Online Ad Procurement in Non-stationary Autobidding Worlds [10.871587311621974]
We introduce a primal-dual algorithm for online decision making with multi-dimension decision variables, bandit feedback and long-term uncertain constraints. We show that our algorithm achieves low regret in many worlds when procurement outcomes are generated through procedures that are adversarial, adversarially corrupted, periodic, and ergodic.
arXiv Detail & Related papers (2023-07-10T00:41:08Z)
Semantic Information Marketing in The Metaverse: A Learning-Based Contract Theory Framework [68.8725783112254]
We address the problem of designing incentive mechanisms by a virtual service provider (VSP) to hire sensing IoT devices to sell their sensing data. Due to the limited bandwidth, we propose to use semantic extraction algorithms to reduce the delivered data by the sensing IoT devices. We propose a novel iterative contract design and use a new variant of multi-agent reinforcement learning (MARL) to solve the modelled multi-dimensional contract problem.
arXiv Detail & Related papers (2023-02-22T15:52:37Z)
Adaptive Risk-Aware Bidding with Budget Constraint in Display Advertising [47.14651340748015]
We propose a novel adaptive risk-aware bidding algorithm with budget constraint via reinforcement learning. We theoretically unveil the intrinsic relation between the uncertainty and the risk tendency based on value at risk (VaR)
arXiv Detail & Related papers (2022-12-06T18:50:09Z)
Data augmentation for efficient learning from parametric experts [88.33380893179697]
We focus on what we call the policy cloning setting, in which we use online or offline queries of an expert to inform the behavior of a student policy. Our approach, augmented policy cloning (APC), uses synthetic states to induce feedback-sensitivity in a region around sampled trajectories. We achieve highly data-efficient transfer of behavior from an expert to a student policy for high-degrees-of-freedom control problems.
arXiv Detail & Related papers (2022-05-23T16:37:16Z)
Off-policy Reinforcement Learning with Optimistic Exploration and Distribution Correction [73.77593805292194]
We train a separate exploration policy to maximize an approximate upper confidence bound of the critics in an off-policy actor-critic framework. To mitigate the off-policy-ness, we adapt the recently introduced DICE framework to learn a distribution correction ratio for off-policy actor-critic training.
arXiv Detail & Related papers (2021-10-22T22:07:51Z)
Techniques Toward Optimizing Viewability in RTB Ad Campaigns Using Reinforcement Learning [0.0]
Reinforcement learning (RL) is an effective technique for training decision-making agents through interactions with their environment. In digital advertising, real-time bidding (RTB) is a common method of allocating advertising inventory through real-time auctions.
arXiv Detail & Related papers (2021-05-21T21:56:12Z)
Decision Rule Elicitation for Domain Adaptation [93.02675868486932]
Human-in-the-loop machine learning is widely used in artificial intelligence (AI) to elicit labels from experts. In this work, we allow experts to additionally produce decision rules describing their decision-making. We show that decision rule elicitation improves domain adaptation of the algorithm and helps to propagate expert's knowledge to the AI model.
arXiv Detail & Related papers (2021-02-23T08:07:22Z)
Learning to Infer User Hidden States for Online Sequential Advertising [52.169666997331724]
We propose our Deep Intents Sequential Advertising (DISA) method to address these issues. The key part of interpretability is to understand a consumer's purchase intent which is, however, unobservable (called hidden states)
arXiv Detail & Related papers (2020-09-03T05:12:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.