Leveraging Reviews: Learning to Price with Buyer and Seller Uncertainty
- URL: http://arxiv.org/abs/2302.09700v2
- Date: Mon, 11 Sep 2023 14:19:05 GMT
- Title: Leveraging Reviews: Learning to Price with Buyer and Seller Uncertainty
- Authors: Wenshuo Guo, Nika Haghtalab, Kirthevasan Kandasamy, Ellen Vitercik
- Abstract summary: We study a pricing problem in an online setting where the seller interacts with a set of buyers of finitely many types.
We provide a no-regret algorithm that the seller can use to obtain high revenue.
- Score: 30.60994780724878
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In online marketplaces, customers have access to hundreds of reviews for a
single product. Buyers often use reviews from other customers that share their
type -- such as height for clothing, skin type for skincare products, and
location for outdoor furniture -- to estimate their values, which they may not
know a priori. Customers with few relevant reviews may hesitate to make a
purchase except at a low price, so for the seller, there is a tension between
setting high prices and ensuring that there are enough reviews so that buyers
can confidently estimate their values. Simultaneously, sellers may use reviews
to gauge the demand for items they wish to sell.
In this work, we study this pricing problem in an online setting where the
seller interacts with a set of buyers of finitely many types, one by one, over
a series of $T$ rounds. At each round, the seller first sets a price. Then a
buyer arrives and examines the reviews of the previous buyers with the same
type, which reveal those buyers' ex-post values. Based on the reviews, the
buyer decides to purchase if they have good reason to believe that their
ex-ante utility is positive. Crucially, the seller does not know the buyer's
type when setting the price, nor even the distribution over types. We provide a
no-regret algorithm that the seller can use to obtain high revenue. When there
are $d$ types, after $T$ rounds, our algorithm achieves a problem-independent
$\tilde O(T^{2/3}d^{1/3})$ regret bound. However, when the smallest probability
$q_{\text{min}}$ that any given type appears is large, specifically when
$q_{\text{min}} \in \Omega(d^{-2/3}T^{-1/3})$, then the same algorithm achieves
a $\tilde O(T^{1/2}q_{\text{min}}^{-1/2})$ regret bound. We complement these
upper bounds with matching lower bounds in both regimes, showing that our
algorithm is minimax optimal up to lower-order terms.
Related papers
- Improved Algorithms for Contextual Dynamic Pricing [24.530341596901476]
In contextual dynamic pricing, a seller sequentially prices goods based on contextual information.
Our algorithm achieves an optimal regret bound of $tildemathcalO(T2/3)$, improving the existing results.
For this model, our algorithm obtains a regret $tildemathcalO(Td+2beta/d+3beta)$, where $d$ is the dimension of the context space.
arXiv Detail & Related papers (2024-06-17T08:26:51Z) - Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback [58.66941279460248]
Learning from human feedback plays an important role in aligning generative models, such as large language models (LLM)
We study a model within this problem domain--contextual dueling bandits with adversarial feedback, where the true preference label can be flipped by an adversary.
We propose an algorithm namely robust contextual dueling bandit (algo), which is based on uncertainty-weighted maximum likelihood estimation.
arXiv Detail & Related papers (2024-04-16T17:59:55Z) - Bandit Sequential Posted Pricing via Half-Concavity [12.373936155910934]
We study sequential posted pricing in the bandit learning model.
In each round the seller posts $n$ prices for the $n$ buyers and the first buyer with a valuation higher than the price takes the item.
Our main results obtain nearly-optimal regret bounds for single-item sequential posted pricing.
arXiv Detail & Related papers (2023-12-20T06:34:15Z) - Dynamic Pricing and Learning with Bayesian Persuasion [18.59029578133633]
We consider a novel dynamic pricing and learning setting where in addition to setting prices of products, the seller also ex-ante commits to 'advertising schemes'
We use the popular Bayesian persuasion framework to model the effect of these signals on the buyers' valuation and purchase responses.
We design an online algorithm that can use past purchase responses to adaptively learn the optimal pricing and advertising strategy.
arXiv Detail & Related papers (2023-04-27T17:52:06Z) - Borda Regret Minimization for Generalized Linear Dueling Bandits [65.09919504862496]
We study the Borda regret minimization problem for dueling bandits, which aims to identify the item with the highest Borda score.
We propose a rich class of generalized linear dueling bandit models, which cover many existing models.
Our algorithm achieves an $tildeO(d2/3 T2/3)$ regret, which is also optimal.
arXiv Detail & Related papers (2023-03-15T17:59:27Z) - Repeated Bilateral Trade Against a Smoothed Adversary [5.939280057673226]
We study repeated bilateral trade where an adaptive $sigma$-smooth adversary generates the valuations of sellers and buyers.
We provide a complete characterization of the regret regimes for fixed-price mechanisms under different feedback models.
arXiv Detail & Related papers (2023-02-21T16:30:10Z) - A Reinforcement Learning Approach in Multi-Phase Second-Price Auction
Design [158.0041488194202]
We study reserve price optimization in multi-phase second price auctions.
From the seller's perspective, we need to efficiently explore the environment in the presence of potentially nontruthful bidders.
Third, the seller's per-step revenue is unknown, nonlinear, and cannot even be directly observed from the environment.
arXiv Detail & Related papers (2022-10-19T03:49:05Z) - Double Auctions with Two-sided Bandit Feedback [11.334374665364214]
Double Auction enables decentralized transfer of goods between multiple buyers and sellers.
We show with confidence bound based bidding, and Average Pricing' there is an efficient price discovery among the participants.
Our paper is the first to provide decentralized learning algorithms in a two-sided market where emphboth sides have uncertain preference that need to be learned.
arXiv Detail & Related papers (2022-08-13T01:03:34Z) - No-Regret Learning in Partially-Informed Auctions [85.67897346422122]
We study a machine learning formulation of auctions with partially-revealed information.
In each round, a new item is drawn from an unknown distribution and the platform publishes a price together with incomplete, "masked" information about the item.
We show that when the distribution over items is known to the buyer and the mask is a SimHash function mapping $mathbbRd$ to $0,1ell$, our algorithm has regret $tilde mathcalO((Tdell)frac12)$.
arXiv Detail & Related papers (2022-02-22T01:15:51Z) - Adversarial Dueling Bandits [85.14061196945599]
We introduce the problem of regret in Adversarial Dueling Bandits.
The learner has to repeatedly choose a pair of items and observe only a relative binary win-loss' feedback for this pair.
Our main result is an algorithm whose $T$-round regret compared to the emphBorda-winner from a set of $K$ items.
arXiv Detail & Related papers (2020-10-27T19:09:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.