Adversarial Constrained Bidding via Minimax Regret Optimization with
Causality-Aware Reinforcement Learning
- URL: http://arxiv.org/abs/2306.07106v1
- Date: Mon, 12 Jun 2023 13:31:58 GMT
- Title: Adversarial Constrained Bidding via Minimax Regret Optimization with
Causality-Aware Reinforcement Learning
- Authors: Haozhe Wang, Chao Du, Panyan Fang, Li He, Liang Wang, Bo Zheng
- Abstract summary: Existing approaches on constrained bidding typically rely on i.i.d. train and test conditions.
We propose a practical Minimax Regret Optimization (MiRO) approach that interleaves between a teacher finding adversarial environments for tutoring and a learner meta-learning its policy over the given distribution of environments.
Our method, MiRO with Causality-aware reinforcement Learning (MiROCL), outperforms prior methods by over 30%.
- Score: 18.408964908248855
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The proliferation of the Internet has led to the emergence of online
advertising, driven by the mechanics of online auctions. In these repeated
auctions, software agents participate on behalf of aggregated advertisers to
optimize for their long-term utility. To fulfill the diverse demands, bidding
strategies are employed to optimize advertising objectives subject to different
spending constraints. Existing approaches on constrained bidding typically rely
on i.i.d. train and test conditions, which contradicts the adversarial nature
of online ad markets where different parties possess potentially conflicting
objectives. In this regard, we explore the problem of constrained bidding in
adversarial bidding environments, which assumes no knowledge about the
adversarial factors. Instead of relying on the i.i.d. assumption, our insight
is to align the train distribution of environments with the potential test
distribution meanwhile minimizing policy regret. Based on this insight, we
propose a practical Minimax Regret Optimization (MiRO) approach that
interleaves between a teacher finding adversarial environments for tutoring and
a learner meta-learning its policy over the given distribution of environments.
In addition, we pioneer to incorporate expert demonstrations for learning
bidding strategies. Through a causality-aware policy design, we improve upon
MiRO by distilling knowledge from the experts. Extensive experiments on both
industrial data and synthetic data show that our method, MiRO with
Causality-aware reinforcement Learning (MiROCL), outperforms prior methods by
over 30%.
Related papers
- Maximizing the Success Probability of Policy Allocations in Online
Systems [5.485872703839928]
In this paper we consider the problem at the level of user timelines instead of individual bid requests.
In order to optimally allocate policies to users, typical multiple treatments allocation methods solve knapsack-like problems.
We introduce the SuccessProMax algorithm that aims at finding the policy allocation which is the most likely to outperform a fixed policy.
arXiv Detail & Related papers (2023-12-26T10:55:33Z) - Robust Representation Learning for Unified Online Top-K Recommendation [39.12191494863331]
We propose a robust representation learning for the unified online top-k recommendation.
Our approach constructs unified modeling in entity space to ensure data fairness.
The proposed method has been successfully deployed online to serve real business scenarios.
arXiv Detail & Related papers (2023-10-24T03:42:20Z) - Online Ad Procurement in Non-stationary Autobidding Worlds [10.871587311621974]
We introduce a primal-dual algorithm for online decision making with multi-dimension decision variables, bandit feedback and long-term uncertain constraints.
We show that our algorithm achieves low regret in many worlds when procurement outcomes are generated through procedures that are adversarial, adversarially corrupted, periodic, and ergodic.
arXiv Detail & Related papers (2023-07-10T00:41:08Z) - Semantic Information Marketing in The Metaverse: A Learning-Based
Contract Theory Framework [68.8725783112254]
We address the problem of designing incentive mechanisms by a virtual service provider (VSP) to hire sensing IoT devices to sell their sensing data.
Due to the limited bandwidth, we propose to use semantic extraction algorithms to reduce the delivered data by the sensing IoT devices.
We propose a novel iterative contract design and use a new variant of multi-agent reinforcement learning (MARL) to solve the modelled multi-dimensional contract problem.
arXiv Detail & Related papers (2023-02-22T15:52:37Z) - Adaptive Risk-Aware Bidding with Budget Constraint in Display
Advertising [47.14651340748015]
We propose a novel adaptive risk-aware bidding algorithm with budget constraint via reinforcement learning.
We theoretically unveil the intrinsic relation between the uncertainty and the risk tendency based on value at risk (VaR)
arXiv Detail & Related papers (2022-12-06T18:50:09Z) - Data augmentation for efficient learning from parametric experts [88.33380893179697]
We focus on what we call the policy cloning setting, in which we use online or offline queries of an expert to inform the behavior of a student policy.
Our approach, augmented policy cloning (APC), uses synthetic states to induce feedback-sensitivity in a region around sampled trajectories.
We achieve highly data-efficient transfer of behavior from an expert to a student policy for high-degrees-of-freedom control problems.
arXiv Detail & Related papers (2022-05-23T16:37:16Z) - Off-policy Reinforcement Learning with Optimistic Exploration and
Distribution Correction [73.77593805292194]
We train a separate exploration policy to maximize an approximate upper confidence bound of the critics in an off-policy actor-critic framework.
To mitigate the off-policy-ness, we adapt the recently introduced DICE framework to learn a distribution correction ratio for off-policy actor-critic training.
arXiv Detail & Related papers (2021-10-22T22:07:51Z) - Techniques Toward Optimizing Viewability in RTB Ad Campaigns Using
Reinforcement Learning [0.0]
Reinforcement learning (RL) is an effective technique for training decision-making agents through interactions with their environment.
In digital advertising, real-time bidding (RTB) is a common method of allocating advertising inventory through real-time auctions.
arXiv Detail & Related papers (2021-05-21T21:56:12Z) - Decision Rule Elicitation for Domain Adaptation [93.02675868486932]
Human-in-the-loop machine learning is widely used in artificial intelligence (AI) to elicit labels from experts.
In this work, we allow experts to additionally produce decision rules describing their decision-making.
We show that decision rule elicitation improves domain adaptation of the algorithm and helps to propagate expert's knowledge to the AI model.
arXiv Detail & Related papers (2021-02-23T08:07:22Z) - Learning to Infer User Hidden States for Online Sequential Advertising [52.169666997331724]
We propose our Deep Intents Sequential Advertising (DISA) method to address these issues.
The key part of interpretability is to understand a consumer's purchase intent which is, however, unobservable (called hidden states)
arXiv Detail & Related papers (2020-09-03T05:12:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.