No-Regret Learning in Partially-Informed Auctions
- URL: http://arxiv.org/abs/2202.10606v1
- Date: Tue, 22 Feb 2022 01:15:51 GMT
- Title: No-Regret Learning in Partially-Informed Auctions
- Authors: Wenshuo Guo, Michael I. Jordan, Ellen Vitercik
- Abstract summary: We study a machine learning formulation of auctions with partially-revealed information.
In each round, a new item is drawn from an unknown distribution and the platform publishes a price together with incomplete, "masked" information about the item.
We show that when the distribution over items is known to the buyer and the mask is a SimHash function mapping $mathbbRd$ to $0,1ell$, our algorithm has regret $tilde mathcalO((Tdell)frac12)$.
- Score: 85.67897346422122
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Auctions with partially-revealed information about items are broadly employed
in real-world applications, but the underlying mechanisms have limited
theoretical support. In this work, we study a machine learning formulation of
these types of mechanisms, presenting algorithms that are no-regret from the
buyer's perspective. Specifically, a buyer who wishes to maximize his utility
interacts repeatedly with a platform over a series of $T$ rounds. In each
round, a new item is drawn from an unknown distribution and the platform
publishes a price together with incomplete, "masked" information about the
item. The buyer then decides whether to purchase the item. We formalize this
problem as an online learning task where the goal is to have low regret with
respect to a myopic oracle that has perfect knowledge of the distribution over
items and the seller's masking function. When the distribution over items is
known to the buyer and the mask is a SimHash function mapping $\mathbb{R}^d$ to
$\{0,1\}^{\ell}$, our algorithm has regret $\tilde
{\mathcal{O}}((Td\ell)^{\frac{1}{2}})$. In a fully agnostic setting when the
mask is an arbitrary function mapping to a set of size $n$, our algorithm has
regret $\tilde {\mathcal{O}}(T^{\frac{3}{4}}n^{\frac{1}{2}})$. Finally, when
the prices are stochastic, the algorithm has regret $\tilde
{\mathcal{O}}((Tn)^{\frac{1}{2}})$.
Related papers
- On learning Whittle index policy for restless bandits with scalable
regret [1.0152838128195465]
Reinforcement learning is an attractive approach to learn good resource allocation and scheduling policies.
The cumulative regret of most RL algorithms scales as $tilde O(mathsfS sqrtmathsfA T)$, where $mathsfS$ is the size of the state space.
We present a model-based RL algorithm for such problems which has scalable regret.
arXiv Detail & Related papers (2022-02-07T19:07:02Z) - Near-Optimal Learning of Extensive-Form Games with Imperfect Information [54.55092907312749]
We present the first line of algorithms that require only $widetildemathcalO((XA+YB)/varepsilon2)$ episodes of play to find an $varepsilon$-approximate Nash equilibrium in two-player zero-sum games.
This improves upon the best known sample complexity of $widetildemathcalO((X2A+Y2B)/varepsilon2)$ by a factor of $widetildemathcalO(maxX,
arXiv Detail & Related papers (2022-02-03T18:18:28Z) - On Submodular Contextual Bandits [92.45432756301231]
We consider the problem of contextual bandits where actions are subsets of a ground set and mean rewards are modeled by an unknown monotone submodular function.
We show that our algorithm efficiently randomizes around local optima of estimated functions according to the Inverse Gap Weighting strategy.
arXiv Detail & Related papers (2021-12-03T21:42:33Z) - Learning to Bid in Contextual First Price Auctions [6.482311591019749]
We consider a single bidder (learner) who repeatedly bids in the first price auctions.
For binary feedback, we propose a bidding algorithm, by using maximum likelihood estimation (MLE) method to achieve at most $widetildeO(sqrtlog(d) T)$ regret.
We also provide a lower bound result such that any bidding policy in a broad class must achieve regret at least $Omega(sqrtT)$.
arXiv Detail & Related papers (2021-09-07T16:11:18Z) - Threshold Phenomena in Learning Halfspaces with Massart Noise [56.01192577666607]
We study the problem of PAC learning halfspaces on $mathbbRd$ with Massart noise under Gaussian marginals.
Our results qualitatively characterize the complexity of learning halfspaces in the Massart model.
arXiv Detail & Related papers (2021-08-19T16:16:48Z) - Contextual Recommendations and Low-Regret Cutting-Plane Algorithms [49.91214213074933]
We consider the following variant of contextual linear bandits motivated by routing applications in navigational engines and recommendation systems.
We design novel cutting-plane algorithms with low "regret" -- the total distance between the true point $w*$ and the hyperplanes the separation oracle returns.
arXiv Detail & Related papers (2021-06-09T05:39:05Z) - Hardness of Learning Halfspaces with Massart Noise [56.98280399449707]
We study the complexity of PAC learning halfspaces in the presence of Massart (bounded) noise.
We show that there an exponential gap between the information-theoretically optimal error and the best error that can be achieved by a SQ algorithm.
arXiv Detail & Related papers (2020-12-17T16:43:11Z) - Fully Gap-Dependent Bounds for Multinomial Logit Bandit [5.132017939561661]
We study the multinomial logit (MNL) bandit problem, where at each time step, the seller offers an assortment of size at most $K$ from a pool of $N$ items.
We present (i) an algorithm that identifies the optimal assortment $S*$ within $widetildeO(sum_i = 1N Delta_i-2)$ time steps with high probability, and (ii) an algorithm that incurs $O(sum_i notin S* KDelta_i
arXiv Detail & Related papers (2020-11-19T17:52:12Z) - Optimal Contextual Pricing and Extensions [32.152826900874075]
We give a poly-time algorithm for contextual pricing with $O(d log log T)$ regret which matches the $Omega(d log T)$ lower bound up to the $d log d$ additive factor.
These algorithms are based on a novel technique of bounding the value of the Steiner intrinsic of a convex region at various scales.
arXiv Detail & Related papers (2020-03-03T18:46:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.