Related papers: A Reinforcement Learning Approach in Multi-Phase Second-Price Auction Design

A Reinforcement Learning Approach in Multi-Phase Second-Price Auction Design

URL: http://arxiv.org/abs/2210.10278v1
Date: Wed, 19 Oct 2022 03:49:05 GMT
Title: A Reinforcement Learning Approach in Multi-Phase Second-Price Auction Design
Authors: Rui Ai, Boxiang Lyu, Zhaoran Wang, Zhuoran Yang and Michael I. Jordan
Abstract summary: We study reserve price optimization in multi-phase second price auctions. From the seller's perspective, we need to efficiently explore the environment in the presence of potentially nontruthful bidders. Third, the seller's per-step revenue is unknown, nonlinear, and cannot even be directly observed from the environment.
Score: 158.0041488194202
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We study reserve price optimization in multi-phase second price auctions, where seller's prior actions affect the bidders' later valuations through a Markov Decision Process (MDP). Compared to the bandit setting in existing works, the setting in ours involves three challenges. First, from the seller's perspective, we need to efficiently explore the environment in the presence of potentially nontruthful bidders who aim to manipulates seller's policy. Second, we want to minimize the seller's revenue regret when the market noise distribution is unknown. Third, the seller's per-step revenue is unknown, nonlinear, and cannot even be directly observed from the environment. We propose a mechanism addressing all three challenges. To address the first challenge, we use a combination of a new technique named "buffer periods" and inspirations from Reinforcement Learning (RL) with low switching cost to limit bidders' surplus from untruthful bidding, thereby incentivizing approximately truthful bidding. The second one is tackled by a novel algorithm that removes the need for pure exploration when the market noise distribution is unknown. The third challenge is resolved by an extension of LSVI-UCB, where we use the auction's underlying structure to control the uncertainty of the revenue function. The three techniques culminate in the $\underline{\rm C}$ontextual-$\underline{\rm L}$SVI-$\underline{\rm U}$CB-$\underline{\rm B}$uffer (CLUB) algorithm which achieves $\tilde{ \mathcal{O}}(H^{5/2}\sqrt{K})$ revenue regret when the market noise is known and $\tilde{ \mathcal{O}}(H^{3}\sqrt{K})$ revenue regret when the noise is unknown with no assumptions on bidders' truthfulness.

Related papers

Improved learning rates in multi-unit uniform price auctions [20.8319469276025]
We study the problem of online learning in repeated multi-unit uniform price auctions focusing on the adversarial opposing bid setting. We prove that a learning algorithm leveraging the structure of this problem achieves a regret of $tildeO(K4/3T2/3)$ under bandit feedback. Inspired by electricity reserve markets, we introduce a different feedback model under which all winning bids are revealed.
arXiv Detail & Related papers (2025-01-17T13:26:12Z)
Near-Optimal Dynamic Regret for Adversarial Linear Mixture MDPs [63.47351876442425]
We study episodic linear mixture MDPs with the unknown transition and adversarial rewards under full-information feedback. We propose a novel algorithm that combines the benefits of two popular methods: occupancy-measure-based and policy-based. Our algorithm enjoys an $widetildemathcalO(d sqrtH3 K + sqrtHK(H + barP_K$)$ dynamic regret, where $d$ is the feature dimension.
arXiv Detail & Related papers (2024-11-05T13:55:52Z)
Improved Algorithms for Contextual Dynamic Pricing [24.530341596901476]
In contextual dynamic pricing, a seller sequentially prices goods based on contextual information. Our algorithm achieves an optimal regret bound of $tildemathcalO(T2/3)$, improving the existing results. For this model, our algorithm obtains a regret $tildemathcalO(Td+2beta/d+3beta)$, where $d$ is the dimension of the context space.
arXiv Detail & Related papers (2024-06-17T08:26:51Z)
Learning in Repeated Multi-Unit Pay-As-Bid Auctions [3.6294895527930504]
We study the problem of bidding strategies in pay-as-bid (PAB) auctions from the perspective of single bidder. We show that a utility trick enables a time algorithm to solve the offline problem where competing bids are known in advance. We also present additional findings on the characterization of PAB equilibria.
arXiv Detail & Related papers (2023-07-27T20:49:28Z)
Dynamic Pricing and Learning with Bayesian Persuasion [18.59029578133633]
We consider a novel dynamic pricing and learning setting where in addition to setting prices of products, the seller also ex-ante commits to 'advertising schemes' We use the popular Bayesian persuasion framework to model the effect of these signals on the buyers' valuation and purchase responses. We design an online algorithm that can use past purchase responses to adaptively learn the optimal pricing and advertising strategy.
arXiv Detail & Related papers (2023-04-27T17:52:06Z)
Borda Regret Minimization for Generalized Linear Dueling Bandits [65.09919504862496]
We study the Borda regret minimization problem for dueling bandits, which aims to identify the item with the highest Borda score. We propose a rich class of generalized linear dueling bandit models, which cover many existing models. Our algorithm achieves an $tildeO(d2/3 T2/3)$ regret, which is also optimal.
arXiv Detail & Related papers (2023-03-15T17:59:27Z)
Fast Rate Learning in Stochastic First Price Bidding [0.0]
First-price auctions have largely replaced traditional bidding approaches based on Vickrey auctions in programmatic advertising. We show how to achieve significantly lower regret when the opponents' maximal bid distribution is known. Our algorithms converge much faster than alternatives proposed in the literature for various bid distributions.
arXiv Detail & Related papers (2021-07-05T07:48:52Z)
Certifiably Robust Interpretation via Renyi Differential Privacy [77.04377192920741]
We study the problem of interpretation robustness from a new perspective of Renyi differential privacy (RDP) First, it can offer provable and certifiable top-$k$ robustness. Second, our proposed method offers $sim10%$ better experimental robustness than existing approaches. Third, our method can provide a smooth tradeoff between robustness and computational efficiency.
arXiv Detail & Related papers (2021-07-04T06:58:01Z)
Efficient Algorithms for Stochastic Repeated Second-price Auctions [0.0]
We develop efficient sequential bidding strategies for repeated auctions. We provide the first parametric lower bound for this problem. We propose more explainable strategies which are reminiscent of the Explore Then Commit bandit algorithm.
arXiv Detail & Related papers (2020-11-10T12:45:02Z)
Stochastic Linear Bandits Robust to Adversarial Attacks [117.665995707568]
We provide two variants of a Robust Phased Elimination algorithm, one that knows $C$ and one that does not. We show that both variants attain near-optimal regret in the non-corrupted case $C = 0$, while incurring additional additive terms respectively. In a contextual setting, we show that a simple greedy algorithm is provably robust with a near-optimal additive regret term, despite performing no explicit exploration and not knowing $C$.
arXiv Detail & Related papers (2020-07-07T09:00:57Z)
Upper Confidence Primal-Dual Reinforcement Learning for CMDP with Adversarial Loss [145.54544979467872]
We consider online learning for episodically constrained Markov decision processes (CMDPs) We propose a new emphupper confidence primal-dual algorithm, which only requires the trajectories sampled from the transition model. Our analysis incorporates a new high-probability drift analysis of Lagrange multiplier processes into the celebrated regret analysis of upper confidence reinforcement learning.
arXiv Detail & Related papers (2020-03-02T05:02:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.