Stochastic Strategic Patient Buyers: Revenue maximization using posted
prices
- URL: http://arxiv.org/abs/2202.06143v1
- Date: Sat, 12 Feb 2022 21:11:31 GMT
- Title: Stochastic Strategic Patient Buyers: Revenue maximization using posted
prices
- Authors: Eitan-Hai Mashiah and Idan Attias and Yishay Mansour
- Abstract summary: We consider a seller faced with buyers which have the ability to delay their decision.
Each buyer's type is composed of value and patience, and it is sampled i.i.d. from a distribution.
We characterize both the optimal pure strategy of the seller and the buyer's best response strategy to any seller's mixed strategy.
- Score: 40.698164629357066
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider a seller faced with buyers which have the ability to delay their
decision, which we call patience. Each buyer's type is composed of value and
patience, and it is sampled i.i.d. from a distribution. The seller, using
posted prices, would like to maximize her revenue from selling to the buyer.
Our main results are the following.
$\bullet$ We formalize this setting and characterize the resulting
Stackelberg equilibrium, where the seller first commits to her strategy and
then the buyers best respond.
$\bullet$ We show a separation between the best fixed price, the best pure
strategy, which is a fixed sequence of prices, and the best mixed strategy,
which is a distribution over price sequences.
$\bullet$ We characterize both the optimal pure strategy of the seller and
the buyer's best response strategy to any seller's mixed strategy.
$\bullet$ We show how to compute efficiently the optimal pure strategy and
give an algorithm for the optimal mixed strategy (which is exponential in the
maximum patience).
We then consider a learning setting, where the seller does not have access to
the distribution over buyer's types. Our main results are the following.
$\bullet$ We derive a sample complexity bound for the learning of an
approximate optimal pure strategy, by computing the fat-shattering dimension of
this setting.
$\bullet$ We give a general sample complexity bound for the approximate
optimal mixed strategy.
$\bullet$ We consider an online setting and derive a vanishing regret bound
with respect to both the optimal pure strategy and the optimal mixed strategy.
Related papers
- Mixing Any Cocktail with Limited Ingredients: On the Structure of Payoff Sets in Multi-Objective MDPs and its Impact on Randomised Strategies [0.0]
We study the structure of the set of expected payoff vectors of all strategies given a multi-dimensional payoff function.
For any payoff for which the expected payoff is finite under all strategies, any expected payoff can be obtained exactly by mixing finitely many strategies.
arXiv Detail & Related papers (2025-02-25T15:33:59Z) - Near-optimal Regret Using Policy Optimization in Online MDPs with Aggregate Bandit Feedback [49.84060509296641]
We study online finite-horizon Markov Decision Processes with adversarially changing loss and aggregate bandit feedback (a.k.a full-bandit)<n>Under this type of feedback, the agent observes only the total loss incurred over the entire trajectory, rather than the individual losses at each intermediate step within the trajectory.<n>We introduce the first Policy Optimization algorithms for this setting.
arXiv Detail & Related papers (2025-02-06T12:03:24Z) - Learning Safe Strategies for Value Maximizing Buyers in Uniform Price Auctions [4.089889918897877]
We study the bidding problem in repeated uniform price multi-unit auctions from the perspective of a value-maximizing buyer.<n>We introduce the notion of safe bidding strategies as those that satisfy the RoI constraints irrespective of competing bids.<n>We show that these strategies satisfy a mild no-overbidding condition, depend only on the valuation curve of the bidder, and the bidder can focus on a finite subset without loss of generality.
arXiv Detail & Related papers (2024-06-06T01:29:47Z) - Strategically-Robust Learning Algorithms for Bidding in First-Price Auctions [11.988955088595858]
Learning to bid in repeated first-price auctions is a fundamental problem at the interface of game theory and machine learning.
We propose a novel concave formulation for pure-strategy bidding in first-price auctions, and use it to analyze natural Gradient-Ascent-based algorithms for this problem.
arXiv Detail & Related papers (2024-02-12T01:33:33Z) - An Online Learning Theory of Brokerage [3.8059763597999012]
We investigate brokerage between traders from an online learning perspective.
Unlike other bilateral trade problems already studied, we focus on the case where there are no designated buyer and seller roles.
We show that the optimal rate degrades to $sqrtT$ in the first case, and the problem becomes unlearnable in the second.
arXiv Detail & Related papers (2023-10-18T17:01:32Z) - Contextual Dynamic Pricing with Strategic Buyers [93.97401997137564]
We study the contextual dynamic pricing problem with strategic buyers.
Seller does not observe the buyer's true feature, but a manipulated feature according to buyers' strategic behavior.
We propose a strategic dynamic pricing policy that incorporates the buyers' strategic behavior into the online learning to maximize the seller's cumulative revenue.
arXiv Detail & Related papers (2023-07-08T23:06:42Z) - ApproxED: Approximate exploitability descent via learned best responses [61.17702187957206]
We study the problem of finding an approximate Nash equilibrium of games with continuous action sets.
We propose two new methods that minimize an approximation of exploitability with respect to the strategy profile.
arXiv Detail & Related papers (2023-01-20T23:55:30Z) - Autoregressive Bandits [58.46584210388307]
We propose a novel online learning setting, Autoregressive Bandits, in which the observed reward is governed by an autoregressive process of order $k$.
We show that, under mild assumptions on the reward process, the optimal policy can be conveniently computed.
We then devise a new optimistic regret minimization algorithm, namely, AutoRegressive Upper Confidence Bound (AR-UCB), that suffers sublinear regret of order $widetildemathcalO left( frac(k+1)3/2sqrtnT (1-G
arXiv Detail & Related papers (2022-12-12T21:37:36Z) - Provably Efficient Offline Multi-agent Reinforcement Learning via
Strategy-wise Bonus [48.34563955829649]
We propose a strategy-wise concentration principle which builds a confidence interval for the joint strategy.
For two-player zero-sum Markov games, by exploiting the convexity of the strategy-wise bonus, we propose an efficient algorithm.
All of our algorithms can naturally take a pre-specified strategy class $Pi$ as input and output a strategy close to the best strategy in $Pi$.
arXiv Detail & Related papers (2022-06-01T00:18:15Z) - Navigating to the Best Policy in Markov Decision Processes [68.8204255655161]
We investigate the active pure exploration problem in Markov Decision Processes.
Agent sequentially selects actions and, from the resulting system trajectory, aims at the best as fast as possible.
arXiv Detail & Related papers (2021-06-05T09:16:28Z) - Universal Trading for Order Execution with Oracle Policy Distillation [99.57416828489568]
We propose a novel universal trading policy optimization framework to bridge the gap between the noisy yet imperfect market states and the optimal action sequences for order execution.
We show that our framework can better guide the learning of the common policy towards practically optimal execution by an oracle teacher with perfect information.
arXiv Detail & Related papers (2021-01-28T05:52:18Z) - On the Impossibility of Convergence of Mixed Strategies with No Regret
Learning [10.515544361834241]
We study convergence properties of the mixed strategies that result from a general class of optimal no regret learning strategies.
We consider the class of strategies whose information set at each step is the empirical average of the opponent's realized play.
arXiv Detail & Related papers (2020-12-03T18:02:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.