Contextual Dynamic Pricing with Heterogeneous Buyers
- URL: http://arxiv.org/abs/2512.09513v1
- Date: Wed, 10 Dec 2025 10:36:21 GMT
- Title: Contextual Dynamic Pricing with Heterogeneous Buyers
- Authors: Thodoris Lykouris, Sloan Nietert, Princewill Okoroafor, Chara Podimata, Julian Zimmert,
- Abstract summary: We study contextual dynamic pricing with a heterogeneous population of buyers.<n>We develop a contextual pricing algorithm based on optimistic posterior sampling with regret.<n>We refine our analysis for the non-contextual pricing case, proposing a variance-aware zooming algorithm.
- Score: 21.608390985958483
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We initiate the study of contextual dynamic pricing with a heterogeneous population of buyers, where a seller repeatedly posts prices (over $T$ rounds) that depend on the observable $d$-dimensional context and receives binary purchase feedback. Unlike prior work assuming homogeneous buyer types, in our setting the buyer's valuation type is drawn from an unknown distribution with finite support size $K_{\star}$. We develop a contextual pricing algorithm based on optimistic posterior sampling with regret $\widetilde{O}(K_{\star}\sqrt{dT})$, which we prove to be tight in $d$ and $T$ up to logarithmic terms. Finally, we refine our analysis for the non-contextual pricing case, proposing a variance-aware zooming algorithm that achieves the optimal dependence on $K_{\star}$.
Related papers
- Pricing Query Complexity of Multiplicative Revenue Approximation [20.644559918113945]
We study the pricing query of revenue for a single buyer whose private valuation is drawn from an unknown distribution.<n>In this setting, the seller must learn the optimal monopoly price by posting prices and observing only binary purchase decisions.
arXiv Detail & Related papers (2026-02-11T03:42:42Z) - Dynamic Pricing with Adversarially-Censored Demands [25.566323930646178]
We study an online dynamic pricing problem where the potential demand at each time period $t=1,2,ldots, T$ is and dependent on the price.<n>A perishable inventory is imposed at the beginning of each time $t$, censoring the potential demand if it exceeds the inventory level.<n>We show that our algorithm achieves $tildeO(sqrtT)$ optimal regret even with adversarial inventory series.
arXiv Detail & Related papers (2025-02-10T05:37:39Z) - Near-optimal Regret Using Policy Optimization in Online MDPs with Aggregate Bandit Feedback [49.84060509296641]
We study online finite-horizon Markov Decision Processes with adversarially changing loss and aggregate bandit feedback (a.k.a full-bandit)<n>Under this type of feedback, the agent observes only the total loss incurred over the entire trajectory, rather than the individual losses at each intermediate step within the trajectory.<n>We introduce the first Policy Optimization algorithms for this setting.
arXiv Detail & Related papers (2025-02-06T12:03:24Z) - Learning to Price Homogeneous Data [6.288169915425957]
We develop novel discretization schemes to approximate any pricing curve.
Our online algorithms build on classical algorithms such as UCB and FTPL.
Using the improved discretization schemes, we are able to achieve $tildeO(msqrtT)$ regret in the setting and $tildeO(m3/2sqrtT)$ regret in the adversarial setting.
arXiv Detail & Related papers (2024-07-07T20:02:52Z) - Fast Rates for Bandit PAC Multiclass Classification [73.17969992976501]
We study multiclass PAC learning with bandit feedback, where inputs are classified into one of $K$ possible labels and feedback is limited to whether or not the predicted labels are correct.
Our main contribution is in designing a novel learning algorithm for the agnostic $(varepsilon,delta)$PAC version of the problem.
arXiv Detail & Related papers (2024-06-18T08:54:04Z) - Improved Algorithms for Contextual Dynamic Pricing [24.530341596901476]
In contextual dynamic pricing, a seller sequentially prices goods based on contextual information.<n>Our algorithm achieves an optimal regret bound of $tildemathcalO(T2/3)$, improving the existing results.<n>For this model, our algorithm obtains a regret $tildemathcalO(Td+2beta/d+3beta)$, where $d$ is the dimension of the context space.
arXiv Detail & Related papers (2024-06-17T08:26:51Z) - Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback [58.66941279460248]
Learning from human feedback plays an important role in aligning generative models, such as large language models (LLM)<n>We study a model within this domain--contextual dueling bandits with adversarial feedback, where the true preference label can be flipped by an adversary.<n>We propose an algorithm namely robust contextual dueling bandits (RCDB), which is based on uncertainty-weighted maximum likelihood estimation.
arXiv Detail & Related papers (2024-04-16T17:59:55Z) - Pricing with Contextual Elasticity and Heteroscedastic Valuation [23.96777734246062]
We study an online contextual dynamic pricing problem, where customers decide whether to purchase a product based on its features and price.
We introduce a novel approach to modeling a customer's expected demand by incorporating feature-based price elasticity.
Our results shed light on the relationship between contextual elasticity and heteroscedastic valuation, providing insights for effective and practical pricing strategies.
arXiv Detail & Related papers (2023-12-26T11:07:37Z) - Autoregressive Bandits [58.46584210388307]
We propose a novel online learning setting, Autoregressive Bandits, in which the observed reward is governed by an autoregressive process of order $k$.
We show that, under mild assumptions on the reward process, the optimal policy can be conveniently computed.
We then devise a new optimistic regret minimization algorithm, namely, AutoRegressive Upper Confidence Bound (AR-UCB), that suffers sublinear regret of order $widetildemathcalO left( frac(k+1)3/2sqrtnT (1-G
arXiv Detail & Related papers (2022-12-12T21:37:36Z) - Navigating to the Best Policy in Markov Decision Processes [68.8204255655161]
We investigate the active pure exploration problem in Markov Decision Processes.
Agent sequentially selects actions and, from the resulting system trajectory, aims at the best as fast as possible.
arXiv Detail & Related papers (2021-06-05T09:16:28Z) - A new regret analysis for Adam-type algorithms [78.825194932103]
In theory, regret guarantees for online convex optimization require a rapidly decaying $beta_1to0$ schedule.
We propose a novel framework that allows us to derive optimal, data-dependent regret bounds with a constant $beta_1$.
arXiv Detail & Related papers (2020-03-21T19:19:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.