Learning to Bid in Contextual First Price Auctions
- URL: http://arxiv.org/abs/2109.03173v1
- Date: Tue, 7 Sep 2021 16:11:18 GMT
- Title: Learning to Bid in Contextual First Price Auctions
- Authors: Ashwinkumar Badanidiyuru and Zhe Feng and Guru Guruganesh
- Abstract summary: We consider a single bidder (learner) who repeatedly bids in the first price auctions.
For binary feedback, we propose a bidding algorithm, by using maximum likelihood estimation (MLE) method to achieve at most $widetildeO(sqrtlog(d) T)$ regret.
We also provide a lower bound result such that any bidding policy in a broad class must achieve regret at least $Omega(sqrtT)$.
- Score: 6.482311591019749
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we investigate the problem about how to bid in repeated
contextual first price auctions. We consider a single bidder (learner) who
repeatedly bids in the first price auctions: at each time $t$, the learner
observes a context $x_t\in \mathbb{R}^d$ and decides the bid based on
historical information and $x_t$. We assume a structured linear model of the
maximum bid of all the others $m_t = \alpha_0\cdot x_t + z_t$, where
$\alpha_0\in \mathbb{R}^d$ is unknown to the learner and $z_t$ is randomly
sampled from a noise distribution $\mathcal{F}$ with log-concave density
function $f$. We consider both \emph{binary feedback} (the learner can only
observe whether she wins or not) and \emph{full information feedback} (the
learner can observe $m_t$) at the end of each time $t$. For binary feedback,
when the noise distribution $\mathcal{F}$ is known, we propose a bidding
algorithm, by using maximum likelihood estimation (MLE) method to achieve at
most $\widetilde{O}(\sqrt{\log(d) T})$ regret. Moreover, we generalize this
algorithm to the setting with binary feedback and the noise distribution is
unknown but belongs to a parametrized family of distributions. For the full
information feedback with \emph{unknown} noise distribution, we provide an
algorithm that achieves regret at most $\widetilde{O}(\sqrt{dT})$. Our approach
combines an estimator for log-concave density functions and then MLE method to
learn the noise distribution $\mathcal{F}$ and linear weight $\alpha_0$
simultaneously. We also provide a lower bound result such that any bidding
policy in a broad class must achieve regret at least $\Omega(\sqrt{T})$, even
when the learner receives the full information feedback and $\mathcal{F}$ is
known.
Related papers
- Corruption-Robust Lipschitz Contextual Search [2.296475290901356]
I study the problem of learning a Lipschitz function with corrupted binary signals.
This work introduces the new algorithmic technique emphagnostic checking as well as new analysis techniques.
arXiv Detail & Related papers (2023-07-26T02:02:19Z) - A Spectral Approach to Item Response Theory [6.5268245109828005]
We propose a emphnew item estimation algorithm for the Rasch model.
The core of our algorithm is the computation of the stationary distribution of a Markov chain defined on an item-item graph.
Experiments on synthetic and real-life datasets show that our algorithm is scalable, accurate, and competitive with the most commonly used methods in the literature.
arXiv Detail & Related papers (2022-10-09T18:57:08Z) - Cryptographic Hardness of Learning Halfspaces with Massart Noise [59.8587499110224]
We study the complexity of PAC learning halfspaces in the presence of Massart noise.
We show that no-time Massart halfspace learners can achieve error better than $Omega(eta)$, even if the optimal 0-1 error is small.
arXiv Detail & Related papers (2022-07-28T17:50:53Z) - Learning a Single Neuron with Adversarial Label Noise via Gradient
Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations.
The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z) - Approximate Function Evaluation via Multi-Armed Bandits [51.146684847667125]
We study the problem of estimating the value of a known smooth function $f$ at an unknown point $boldsymbolmu in mathbbRn$, where each component $mu_i$ can be sampled via a noisy oracle.
We design an instance-adaptive algorithm that learns to sample according to the importance of each coordinate, and with probability at least $1-delta$ returns an $epsilon$ accurate estimate of $f(boldsymbolmu)$.
arXiv Detail & Related papers (2022-03-18T18:50:52Z) - Threshold Phenomena in Learning Halfspaces with Massart Noise [56.01192577666607]
We study the problem of PAC learning halfspaces on $mathbbRd$ with Massart noise under Gaussian marginals.
Our results qualitatively characterize the complexity of learning halfspaces in the Massart model.
arXiv Detail & Related papers (2021-08-19T16:16:48Z) - Gap-Dependent Unsupervised Exploration for Reinforcement Learning [40.990467706237396]
We present an efficient algorithm for task-agnostic reinforcement learning.
The algorithm takes only $widetildemathcalO (1/epsilon cdot (H3SA / rho + H4 S2 A) )$ episodes of exploration.
We show that, information-theoretically, this bound is nearly tight for $rho Theta (1/(HS))$ and $H>1$.
arXiv Detail & Related papers (2021-08-11T20:42:46Z) - Randomized Exploration for Reinforcement Learning with General Value
Function Approximation [122.70803181751135]
We propose a model-free reinforcement learning algorithm inspired by the popular randomized least squares value iteration (RLSVI) algorithm.
Our algorithm drives exploration by simply perturbing the training data with judiciously chosen i.i.d. scalar noises.
We complement the theory with an empirical evaluation across known difficult exploration tasks.
arXiv Detail & Related papers (2021-06-15T02:23:07Z) - Model-Free Reinforcement Learning: from Clipped Pseudo-Regret to Sample
Complexity [59.34067736545355]
Given an MDP with $S$ states, $A$ actions, the discount factor $gamma in (0,1)$, and an approximation threshold $epsilon > 0$, we provide a model-free algorithm to learn an $epsilon$-optimal policy.
For small enough $epsilon$, we show an improved algorithm with sample complexity.
arXiv Detail & Related papers (2020-06-06T13:34:41Z) - Optimal Contextual Pricing and Extensions [32.152826900874075]
We give a poly-time algorithm for contextual pricing with $O(d log log T)$ regret which matches the $Omega(d log T)$ lower bound up to the $d log d$ additive factor.
These algorithms are based on a novel technique of bounding the value of the Steiner intrinsic of a convex region at various scales.
arXiv Detail & Related papers (2020-03-03T18:46:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.