Related papers: Improved Online Learning Algorithms for CTR Prediction in Ad Auctions

Improved Online Learning Algorithms for CTR Prediction in Ad Auctions

URL: http://arxiv.org/abs/2403.00845v1
Date: Thu, 29 Feb 2024 14:10:26 GMT
Title: Improved Online Learning Algorithms for CTR Prediction in Ad Auctions
Authors: Zhe Feng, Christopher Liaw, Zixin Zhou
Abstract summary: We investigate the online learning problem of revenue in ad auctions. We focus on two models of the advertisers' strategic behaviors. We develop an online mechanism based on upper-confidence bounds that achieves a tight $O(sqrtT)$ regret.
Score: 8.2536631346421
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this work, we investigate the online learning problem of revenue maximization in ad auctions, where the seller needs to learn the click-through rates (CTRs) of each ad candidate and charge the price of the winner through a pay-per-click manner. We focus on two models of the advertisers' strategic behaviors. First, we assume that the advertiser is completely myopic; i.e.~in each round, they aim to maximize their utility only for the current round. In this setting, we develop an online mechanism based on upper-confidence bounds that achieves a tight $O(\sqrt{T})$ regret in the worst-case and negative regret when the values are static across all the auctions and there is a gap between the highest expected value (i.e.~value multiplied by their CTR) and second highest expected value ad. Next, we assume that the advertiser is non-myopic and cares about their long term utility. This setting is much more complex since an advertiser is incentivized to influence the mechanism by bidding strategically in earlier rounds. In this setting, we provide an algorithm to achieve negative regret for the static valuation setting (with a positive gap), which is in sharp contrast with the prior work that shows $O(T^{2/3})$ regret when the valuation is generated by adversary.

Related papers

Online Bidding under RoS Constraints without Knowing the Value [22.193658401789033]
We consider the problem of bidding in online advertising, where an advertiser aims to maximize value while adhering to budget and Return-on-Spend constraints. We propose a novel Upper Confidence Bound (UCB)-style algorithm that carefully manages this trade-off.
arXiv Detail & Related papers (2025-03-05T05:25:54Z)
A pragmatic policy learning approach to account for users' fatigue in repeated auctions [47.75983850930121]
ML models can predict how previously won auctions the current opportunity value. A policy that uses this prediction tomaximize the expected payoff of the current auction could be dubbedimpatient. We provide two empirical arguments for the importance of this cost ofimpatience.
arXiv Detail & Related papers (2024-07-15T07:53:29Z)
No-Regret Algorithms in non-Truthful Auctions with Budget and ROI Constraints [0.9694940903078658]
We study the problem of designing online autobidding algorithms to optimize value subject to ROI and budget constraints. Our main result is an algorithm with full information feedback that guarantees a near-optimal $tilde O(sqrt T)$ regret with respect to the best Lipschitz function.
arXiv Detail & Related papers (2024-04-15T14:31:53Z)
Online Learning in Contextual Second-Price Pay-Per-Click Auctions [47.06746975822902]
We study online learning in contextual pay-per-click auctions where at each of the $T$ rounds, the learner receives some context along with a set of ads. The learner's goal is to minimize her regret, defined as the gap between her total revenue and that of an oracle strategy.
arXiv Detail & Related papers (2023-10-08T07:04:22Z)
Autobidders with Budget and ROI Constraints: Efficiency, Regret, and Pacing Dynamics [53.62091043347035]
We study a game between autobidding algorithms that compete in an online advertising platform. We propose a gradient-based learning algorithm that is guaranteed to satisfy all constraints and achieves vanishing individual regret.
arXiv Detail & Related papers (2023-01-30T21:59:30Z)
Leveraging the Hints: Adaptive Bidding in Repeated First-Price Auctions [42.002983450368134]
We study the question of how to bid in first-price auctions. Unlike in second-price auctions, bidding one's private value truthfully is no longer optimal. We consider two types of hints: one where a single point-prediction is available, and the other where a hint interval is available.
arXiv Detail & Related papers (2022-11-05T19:20:53Z)
A Reinforcement Learning Approach in Multi-Phase Second-Price Auction Design [158.0041488194202]
We study reserve price optimization in multi-phase second price auctions. From the seller's perspective, we need to efficiently explore the environment in the presence of potentially nontruthful bidders. Third, the seller's per-step revenue is unknown, nonlinear, and cannot even be directly observed from the environment.
arXiv Detail & Related papers (2022-10-19T03:49:05Z)
Fast Rate Learning in Stochastic First Price Bidding [0.0]
First-price auctions have largely replaced traditional bidding approaches based on Vickrey auctions in programmatic advertising. We show how to achieve significantly lower regret when the opponents' maximal bid distribution is known. Our algorithms converge much faster than alternatives proposed in the literature for various bid distributions.
arXiv Detail & Related papers (2021-07-05T07:48:52Z)
A novel auction system for selecting advertisements in Real-Time bidding [68.8204255655161]
Real-Time Bidding is a new Internet advertising system that has become very popular in recent years. We propose an alternative betting system with a new approach that not only considers the economic aspect but also other relevant factors for the functioning of the advertising system.
arXiv Detail & Related papers (2020-10-22T18:36:41Z)
Adversarial Attacks on Linear Contextual Bandits [87.08004581867537]
Malicious agents may have incentives to attack the bandit algorithm to induce it to perform a desired behavior. We show that a malicious agent can force a linear contextual bandit algorithm to pull any desired arm $T - o(T)$ times over a horizon of $T$ steps. We also investigate the case when a malicious agent is interested in affecting the behavior of the bandit algorithm in a single context.
arXiv Detail & Related papers (2020-02-10T15:04:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.