Related papers: Leveraging Reviews: Learning to Price with Buyer and Seller Uncertainty

Leveraging Reviews: Learning to Price with Buyer and Seller Uncertainty

URL: http://arxiv.org/abs/2302.09700v2
Date: Mon, 11 Sep 2023 14:19:05 GMT
Title: Leveraging Reviews: Learning to Price with Buyer and Seller Uncertainty
Authors: Wenshuo Guo, Nika Haghtalab, Kirthevasan Kandasamy, Ellen Vitercik
Abstract summary: We study a pricing problem in an online setting where the seller interacts with a set of buyers of finitely many types. We provide a no-regret algorithm that the seller can use to obtain high revenue.
Score: 30.60994780724878
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In online marketplaces, customers have access to hundreds of reviews for a single product. Buyers often use reviews from other customers that share their type -- such as height for clothing, skin type for skincare products, and location for outdoor furniture -- to estimate their values, which they may not know a priori. Customers with few relevant reviews may hesitate to make a purchase except at a low price, so for the seller, there is a tension between setting high prices and ensuring that there are enough reviews so that buyers can confidently estimate their values. Simultaneously, sellers may use reviews to gauge the demand for items they wish to sell. In this work, we study this pricing problem in an online setting where the seller interacts with a set of buyers of finitely many types, one by one, over a series of $T$ rounds. At each round, the seller first sets a price. Then a buyer arrives and examines the reviews of the previous buyers with the same type, which reveal those buyers' ex-post values. Based on the reviews, the buyer decides to purchase if they have good reason to believe that their ex-ante utility is positive. Crucially, the seller does not know the buyer's type when setting the price, nor even the distribution over types. We provide a no-regret algorithm that the seller can use to obtain high revenue. When there are $d$ types, after $T$ rounds, our algorithm achieves a problem-independent $\tilde O(T^{2/3}d^{1/3})$ regret bound. However, when the smallest probability $q_{\text{min}}$ that any given type appears is large, specifically when $q_{\text{min}} \in \Omega(d^{-2/3}T^{-1/3})$, then the same algorithm achieves a $\tilde O(T^{1/2}q_{\text{min}}^{-1/2})$ regret bound. We complement these upper bounds with matching lower bounds in both regimes, showing that our algorithm is minimax optimal up to lower-order terms.

Related papers

Online Assortment and Price Optimization Under Contextual Choice Models [13.578723345690582]
We consider an assortment selection and pricing problem in which a seller has $N$ different items available for sale. In each round, the seller observes a $d$-dimensional contextual preference information vector for the user, and offers to the user an assortment of $K$ items at prices chosen by the seller. The user selects at most one of the products from the offered assortment according to a multinomial logit choice model whose parameters are unknown. We propose an algorithm that learns from user feedback and achieves a revenue regret of order $widetildeO(d sqrtK
arXiv Detail & Related papers (2025-03-14T19:15:33Z)
Improved Algorithms for Contextual Dynamic Pricing [24.530341596901476]
In contextual dynamic pricing, a seller sequentially prices goods based on contextual information. Our algorithm achieves an optimal regret bound of $tildemathcalO(T2/3)$, improving the existing results. For this model, our algorithm obtains a regret $tildemathcalO(Td+2beta/d+3beta)$, where $d$ is the dimension of the context space.
arXiv Detail & Related papers (2024-06-17T08:26:51Z)
Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback [58.66941279460248]
Learning from human feedback plays an important role in aligning generative models, such as large language models (LLM) We study a model within this problem domain--contextual dueling bandits with adversarial feedback, where the true preference label can be flipped by an adversary. We propose an algorithm namely robust contextual dueling bandit (algo), which is based on uncertainty-weighted maximum likelihood estimation.
arXiv Detail & Related papers (2024-04-16T17:59:55Z)
Bandit Sequential Posted Pricing via Half-Concavity [12.373936155910934]
We study sequential posted pricing in the bandit learning model. In each round the seller posts $n$ prices for the $n$ buyers and the first buyer with a valuation higher than the price takes the item. Our main results obtain nearly-optimal regret bounds for single-item sequential posted pricing.
arXiv Detail & Related papers (2023-12-20T06:34:15Z)
Dynamic Pricing and Learning with Bayesian Persuasion [18.59029578133633]
We consider a novel dynamic pricing and learning setting where in addition to setting prices of products, the seller also ex-ante commits to 'advertising schemes' We use the popular Bayesian persuasion framework to model the effect of these signals on the buyers' valuation and purchase responses. We design an online algorithm that can use past purchase responses to adaptively learn the optimal pricing and advertising strategy.
arXiv Detail & Related papers (2023-04-27T17:52:06Z)
Borda Regret Minimization for Generalized Linear Dueling Bandits [65.09919504862496]
We study the Borda regret minimization problem for dueling bandits, which aims to identify the item with the highest Borda score. We propose a rich class of generalized linear dueling bandit models, which cover many existing models. Our algorithm achieves an $tildeO(d2/3 T2/3)$ regret, which is also optimal.
arXiv Detail & Related papers (2023-03-15T17:59:27Z)
Repeated Bilateral Trade Against a Smoothed Adversary [5.939280057673226]
We study repeated bilateral trade where an adaptive $sigma$-smooth adversary generates the valuations of sellers and buyers. We provide a complete characterization of the regret regimes for fixed-price mechanisms under different feedback models.
arXiv Detail & Related papers (2023-02-21T16:30:10Z)
A Reinforcement Learning Approach in Multi-Phase Second-Price Auction Design [158.0041488194202]
We study reserve price optimization in multi-phase second price auctions. From the seller's perspective, we need to efficiently explore the environment in the presence of potentially nontruthful bidders. Third, the seller's per-step revenue is unknown, nonlinear, and cannot even be directly observed from the environment.
arXiv Detail & Related papers (2022-10-19T03:49:05Z)
Double Auctions with Two-sided Bandit Feedback [11.334374665364214]
Double Auction enables decentralized transfer of goods between multiple buyers and sellers. We show with confidence bound based bidding, and Average Pricing' there is an efficient price discovery among the participants. Our paper is the first to provide decentralized learning algorithms in a two-sided market where emphboth sides have uncertain preference that need to be learned.
arXiv Detail & Related papers (2022-08-13T01:03:34Z)
No-Regret Learning in Partially-Informed Auctions [85.67897346422122]
We study a machine learning formulation of auctions with partially-revealed information. In each round, a new item is drawn from an unknown distribution and the platform publishes a price together with incomplete, "masked" information about the item. We show that when the distribution over items is known to the buyer and the mask is a SimHash function mapping $mathbbRd$ to $0,1ell$, our algorithm has regret $tilde mathcalO((Tdell)frac12)$.
arXiv Detail & Related papers (2022-02-22T01:15:51Z)
Adversarial Dueling Bandits [85.14061196945599]
We introduce the problem of regret in Adversarial Dueling Bandits. The learner has to repeatedly choose a pair of items and observe only a relative binary win-loss' feedback for this pair. Our main result is an algorithm whose $T$-round regret compared to the emphBorda-winner from a set of $K$ items.
arXiv Detail & Related papers (2020-10-27T19:09:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.