Related papers: Maximizing the Success Probability of Policy Allocations in Online Systems

Maximizing the Success Probability of Policy Allocations in Online Systems

URL: http://arxiv.org/abs/2312.16267v1
Date: Tue, 26 Dec 2023 10:55:33 GMT
Title: Maximizing the Success Probability of Policy Allocations in Online Systems
Authors: Artem Betlei, Mariia Vladimirova, Mehdi Sebbar, Nicolas Urien, Thibaud Rahier, Benjamin Heymann
Abstract summary: In this paper we consider the problem at the level of user timelines instead of individual bid requests. In order to optimally allocate policies to users, typical multiple treatments allocation methods solve knapsack-like problems. We introduce the SuccessProMax algorithm that aims at finding the policy allocation which is the most likely to outperform a fixed policy.
Score: 5.485872703839928
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The effectiveness of advertising in e-commerce largely depends on the ability of merchants to bid on and win impressions for their targeted users. The bidding procedure is highly complex due to various factors such as market competition, user behavior, and the diverse objectives of advertisers. In this paper we consider the problem at the level of user timelines instead of individual bid requests, manipulating full policies (i.e. pre-defined bidding strategies) and not bid values. In order to optimally allocate policies to users, typical multiple treatments allocation methods solve knapsack-like problems which aim at maximizing an expected value under constraints. In the industrial contexts such as online advertising, we argue that optimizing for the probability of success is a more suited objective than expected value maximization, and we introduce the SuccessProbaMax algorithm that aims at finding the policy allocation which is the most likely to outperform a fixed reference policy. Finally, we conduct comprehensive experiments both on synthetic and real-world data to evaluate its performance. The results demonstrate that our proposed algorithm outperforms conventional expected-value maximization algorithms in terms of success rate.

Related papers

A Principled Approach to Randomized Selection under Uncertainty: Applications to Peer Review and Grant Funding [68.43987626137512]
We propose a principled framework for randomized decision-making based on interval estimates of the quality of each item.<n>We introduce MERIT, an optimization-based method that maximizes the worst-case expected number of top candidates selected.<n>We prove that MERIT satisfies desirable axiomatic properties not guaranteed by existing approaches.
arXiv Detail & Related papers (2025-06-23T19:59:30Z)
Learning to Bid in Non-Stationary Repeated First-Price Auctions [20.933903777434086]
First-price auctions have gained significant traction in digital advertising markets.<n> determining an optimal bidding strategy in first-price auctions is more complex.<n>We provide a minimax-optimal characterization of the dynamic regret for the class of sequences of opponents' highest bids.
arXiv Detail & Related papers (2025-01-23T03:53:27Z)
Targeted Advertising on Social Networks Using Online Variational Tensor Regression [19.586412285513962]
We propose what we believe is the first contextual bandit framework for online targeted advertising. The proposed framework is designed to accommodate any number of feature vectors in the form of multi-mode tensor. We empirically confirm that the proposedUCB algorithm achieves a significant improvement in influence tasks over the benchmarks.
arXiv Detail & Related papers (2022-08-22T22:10:45Z)
Functional Optimization Reinforcement Learning for Real-Time Bidding [14.5826735379053]
Real-time bidding is the new paradigm of programmatic advertising. Existing approaches are struggling to provide a satisfactory solution for bidding optimization. This paper proposes a multi-agent reinforcement learning architecture for RTB with functional optimization.
arXiv Detail & Related papers (2022-06-25T06:12:17Z)
A Unified Framework for Campaign Performance Forecasting in Online Display Advertising [9.005665883444902]
Interpretable and accurate results could enable advertisers to manage and optimize their campaign criteria. New framework reproduces campaign performance on historical logs under various bidding types with a unified replay algorithm. Method captures mixture calibration patterns among related forecast indicators to map the estimated results to the true ones.
arXiv Detail & Related papers (2022-02-24T03:04:29Z)
Understanding the Effect of Stochasticity in Policy Optimization [86.7574122154668]
We show that the preferability of optimization methods depends critically on whether exact gradients are used. Second, to explain these findings we introduce the concept of committal rate for policy optimization. Third, we show that in the absence of external oracle information, there is an inherent trade-off between exploiting geometry to accelerate convergence versus achieving optimality almost surely.
arXiv Detail & Related papers (2021-10-29T06:35:44Z)
Policy Gradient Bayesian Robust Optimization for Imitation Learning [49.881386773269746]
We derive a novel policy gradient-style robust optimization approach, PG-BROIL, to balance expected performance and risk. Results suggest PG-BROIL can produce a family of behaviors ranging from risk-neutral to risk-averse.
arXiv Detail & Related papers (2021-06-11T16:49:15Z)
Universal Trading for Order Execution with Oracle Policy Distillation [99.57416828489568]
We propose a novel universal trading policy optimization framework to bridge the gap between the noisy yet imperfect market states and the optimal action sequences for order execution. We show that our framework can better guide the learning of the common policy towards practically optimal execution by an oracle teacher with perfect information.
arXiv Detail & Related papers (2021-01-28T05:52:18Z)
Risk-Sensitive Deep RL: Variance-Constrained Actor-Critic Provably Finds Globally Optimal Policy [95.98698822755227]
We make the first attempt to study risk-sensitive deep reinforcement learning under the average reward setting with the variance risk criteria. We propose an actor-critic algorithm that iteratively and efficiently updates the policy, the Lagrange multiplier, and the Fenchel dual variable.
arXiv Detail & Related papers (2020-12-28T05:02:26Z)
Dynamic Knapsack Optimization Towards Efficient Multi-Channel Sequential Advertising [52.3825928886714]
We formulate the sequential advertising strategy optimization as a dynamic knapsack problem. We propose a theoretically guaranteed bilevel optimization framework, which significantly reduces the solution space of the original optimization space. To improve the exploration efficiency of reinforcement learning, we also devise an effective action space reduction approach.
arXiv Detail & Related papers (2020-06-29T18:50:35Z)
Optimal Bidding Strategy without Exploration in Real-time Bidding [14.035270361462576]
maximizing utility with a budget constraint is the primary goal for advertisers in real-time bidding (RTB) systems. Previous works ignore the losing auctions to alleviate the difficulty with censored states. We propose a novel practical framework using the maximum entropy principle to imitate the behavior of the true distribution observed in real-time traffic.
arXiv Detail & Related papers (2020-03-31T20:43:28Z)
Toward Optimal Adversarial Policies in the Multiplicative Learning System with a Malicious Expert [87.12201611818698]
We consider a learning system that combines experts' advice to predict a sequence of true outcomes. It is assumed that one of the experts is malicious and aims to impose the maximum loss on the system. We show that a simple greedy policy of always reporting false prediction is optimal with an approximation ratio of $1+O(sqrtfracln NN)$. For the online setting where the malicious expert can adaptively make its decisions, we show that the optimal online policy can be efficiently computed by solving a dynamic program in $O(N3)$.
arXiv Detail & Related papers (2020-01-02T18:04:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.