Online Learning in Contextual Second-Price Pay-Per-Click Auctions
- URL: http://arxiv.org/abs/2310.05047v1
- Date: Sun, 8 Oct 2023 07:04:22 GMT
- Title: Online Learning in Contextual Second-Price Pay-Per-Click Auctions
- Authors: Mengxiao Zhang, Haipeng Luo
- Abstract summary: We study online learning in contextual pay-per-click auctions where at each of the $T$ rounds, the learner receives some context along with a set of ads.
The learner's goal is to minimize her regret, defined as the gap between her total revenue and that of an oracle strategy.
- Score: 47.06746975822902
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study online learning in contextual pay-per-click auctions where at each
of the $T$ rounds, the learner receives some context along with a set of ads
and needs to make an estimate on their click-through rate (CTR) in order to run
a second-price pay-per-click auction. The learner's goal is to minimize her
regret, defined as the gap between her total revenue and that of an oracle
strategy that always makes perfect CTR predictions. We first show that
$\sqrt{T}$-regret is obtainable via a computationally inefficient algorithm and
that it is unavoidable since our algorithm is no easier than the classical
multi-armed bandit problem. A by-product of our results is a $\sqrt{T}$-regret
bound for the simpler non-contextual setting, improving upon a recent work of
[Feng et al., 2023] by removing the inverse CTR dependency that could be
arbitrarily large. Then, borrowing ideas from recent advances on efficient
contextual bandit algorithms, we develop two practically efficient contextual
auction algorithms: the first one uses the exponential weight scheme with
optimistic square errors and maintains the same $\sqrt{T}$-regret bound, while
the second one reduces the problem to online regression via a simple
epsilon-greedy strategy, albeit with a worse regret bound. Finally, we conduct
experiments on a synthetic dataset to showcase the effectiveness and superior
performance of our algorithms.
Related papers
- Strategically-Robust Learning Algorithms for Bidding in First-Price Auctions [11.988955088595858]
Learning to bid in repeated first-price auctions is a fundamental problem at the interface of game theory and machine learning.
We propose a novel concave formulation for pure-strategy bidding in first-price auctions, and use it to analyze natural Gradient-Ascent-based algorithms for this problem.
arXiv Detail & Related papers (2024-02-12T01:33:33Z) - Contextual Bandits and Imitation Learning via Preference-Based Active
Queries [17.73844193143454]
We consider the problem of contextual bandits and imitation learning, where the learner lacks direct knowledge of the executed action's reward.
Instead, the learner can actively query an expert at each round to compare two actions and receive noisy preference feedback.
The learner's objective is two-fold: to minimize the regret associated with the executed actions, while simultaneously, minimizing the number of comparison queries made to the expert.
arXiv Detail & Related papers (2023-07-24T16:36:04Z) - The Best of Both Worlds: Reinforcement Learning with Logarithmic Regret
and Policy Switches [84.54669549718075]
We study the problem of regret minimization for episodic Reinforcement Learning (RL)
We focus on learning with general function classes and general model classes.
We show that a logarithmic regret bound is realizable by algorithms with $O(log T)$ switching cost.
arXiv Detail & Related papers (2022-03-03T02:55:55Z) - Efficient and Optimal Algorithms for Contextual Dueling Bandits under
Realizability [59.81339109121384]
We study the $K$ contextual dueling bandit problem, a sequential decision making setting in which the learner uses contextual information to make two decisions, but only observes emphpreference-based feedback suggesting that one decision was better than the other.
We provide a new algorithm that achieves the optimal regret rate for a new notion of best response regret, which is a strictly stronger performance measure than those considered in prior works.
arXiv Detail & Related papers (2021-11-24T07:14:57Z) - Linear Contextual Bandits with Adversarial Corruptions [91.38793800392108]
We study the linear contextual bandit problem in the presence of adversarial corruption.
We present a variance-aware algorithm that is adaptive to the level of adversarial contamination $C$.
arXiv Detail & Related papers (2021-10-25T02:53:24Z) - Online Markov Decision Processes with Aggregate Bandit Feedback [74.85532145498742]
We study a novel variant of online finite-horizon Markov Decision Processes with adversarially changing loss functions and initially unknown dynamics.
In each episode, the learner suffers the loss accumulated along the trajectory realized by the policy chosen for the episode, and observes aggregate bandit feedback.
Our main result is a computationally efficient algorithm with $O(sqrtK)$ regret for this setting, where $K$ is the number of episodes.
arXiv Detail & Related papers (2021-01-31T16:49:07Z) - Impact of Representation Learning in Linear Bandits [83.17684841392754]
We study how representation learning can improve the efficiency of bandit problems.
We present a new algorithm which achieves $widetildeO(TsqrtkN + sqrtdkNT)$ regret, where $N$ is the number of rounds we play for each bandit.
arXiv Detail & Related papers (2020-10-13T16:35:30Z) - Regret Minimization in Stochastic Contextual Dueling Bandits [40.17224226373741]
We consider the problem of $K$-armed dueling bandit in the contextual setting.
We present two algorithms for the setup with respective regret guarantees.
arXiv Detail & Related papers (2020-02-20T06:36:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.