On Dynamic Pricing with Covariates
- URL: http://arxiv.org/abs/2112.13254v3
- Date: Mon, 13 Nov 2023 13:55:32 GMT
- Title: On Dynamic Pricing with Covariates
- Authors: Hanzhao Wang, Kalyan Talluri, Xiaocheng Li
- Abstract summary: We show that UCB and Thompson sampling-based pricing algorithms can achieve an $O(dsqrtTlog T)$ regret upper bound.
Our upper bound on the regret matches the lower bound up to logarithmic factors.
- Score: 6.6543199581017625
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider dynamic pricing with covariates under a generalized linear demand
model: a seller can dynamically adjust the price of a product over a horizon of
$T$ time periods, and at each time period $t$, the demand of the product is
jointly determined by the price and an observable covariate vector
$x_t\in\mathbb{R}^d$ through a generalized linear model with unknown
co-efficients. Most of the existing literature assumes the covariate vectors
$x_t$'s are independently and identically distributed (i.i.d.); the few papers
that relax this assumption either sacrifice model generality or yield
sub-optimal regret bounds. In this paper, we show that UCB and Thompson
sampling-based pricing algorithms can achieve an $O(d\sqrt{T}\log T)$ regret
upper bound without assuming any statistical structure on the covariates $x_t$.
Our upper bound on the regret matches the lower bound up to logarithmic
factors. We thus show that (i) the i.i.d. assumption is not necessary for
obtaining low regret, and (ii) the regret bound can be independent of the
(inverse) minimum eigenvalue of the covariance matrix of the $x_t$'s, a
quantity present in previous bounds. Moreover, we consider a constrained
setting of the dynamic pricing problem where there is a limited and
unreplenishable inventory and we develop theoretical results that relate the
best achievable algorithm performance to a variation measure with respect to
the temporal distribution shift of the covariates. We also discuss conditions
under which a better regret is achievable and demonstrate the proposed
algorithms' performance with numerical experiments.
Related papers
- Minimax Optimality in Contextual Dynamic Pricing with General Valuation Models [4.156757591117864]
We propose a novel algorithm that achieves improved regret bounds while minimizing assumptions about the problem.
Our method extends beyond linear valuation models commonly used in dynamic pricing by considering general function spaces.
arXiv Detail & Related papers (2024-06-24T23:43:56Z) - Contextual Dynamic Pricing: Algorithms, Optimality, and Local Differential Privacy Constraints [10.057344315478709]
We study the contextual dynamic pricing problem where a firm sells products to $T$ sequentially arriving consumers.
We first show that the optimal regret upper bound is of order $sqrtdT$, up to a logarithmic factor.
A key insight of our theoretical result is an intrinsic connection between dynamic pricing and the contextual multi-armed bandit problem.
arXiv Detail & Related papers (2024-06-04T15:44:10Z) - Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic
Shortest Path [80.60592344361073]
We study the Shortest Path (SSP) problem with a linear mixture transition kernel.
An agent repeatedly interacts with a environment and seeks to reach certain goal state while minimizing the cumulative cost.
Existing works often assume a strictly positive lower bound of the iteration cost function or an upper bound of the expected length for the optimal policy.
arXiv Detail & Related papers (2024-02-14T07:52:00Z) - Efficient Frameworks for Generalized Low-Rank Matrix Bandit Problems [61.85150061213987]
We study the generalized low-rank matrix bandit problem, proposed in citelu2021low under the Generalized Linear Model (GLM) framework.
To overcome the computational infeasibility and theoretical restrain of existing algorithms, we first propose the G-ESTT framework.
We show that G-ESTT can achieve the $tildeO(sqrt(d_1+d_2)3/2Mr3/2T)$ bound of regret while G-ESTS can achineve the $tildeO
arXiv Detail & Related papers (2024-01-14T14:14:19Z) - Adversarial Contextual Bandits Go Kernelized [21.007410990554522]
We study a generalization of the problem of online learning in adversarial linear contextual bandits by incorporating loss functions that belong to a Hilbert kernel space.
We propose a new optimistically biased estimator for the loss functions and reproducing near-optimal regret guarantees.
arXiv Detail & Related papers (2023-10-02T19:59:39Z) - Variance-Aware Regret Bounds for Stochastic Contextual Dueling Bandits [53.281230333364505]
This paper studies the problem of contextual dueling bandits, where the binary comparison of dueling arms is generated from a generalized linear model (GLM)
We propose a new SupLinUCB-type algorithm that enjoys computational efficiency and a variance-aware regret bound $tilde Obig(dsqrtsum_t=1Tsigma_t2 + dbig)$.
Our regret bound naturally aligns with the intuitive expectation in scenarios where the comparison is deterministic, the algorithm only suffers from an $tilde O(d)$ regret.
arXiv Detail & Related papers (2023-10-02T08:15:52Z) - Settling the Sample Complexity of Online Reinforcement Learning [92.02082223856479]
We show how to achieve minimax-optimal regret without incurring any burn-in cost.
We extend our theory to unveil the influences of problem-dependent quantities like the optimal value/cost and certain variances.
arXiv Detail & Related papers (2023-07-25T15:42:11Z) - Variance-Dependent Regret Bounds for Linear Bandits and Reinforcement
Learning: Adaptivity and Computational Efficiency [90.40062452292091]
We present the first computationally efficient algorithm for linear bandits with heteroscedastic noise.
Our algorithm is adaptive to the unknown variance of noise and achieves an $tildeO(d sqrtsum_k = 1K sigma_k2 + d)$ regret.
We also propose a variance-adaptive algorithm for linear mixture Markov decision processes (MDPs) in reinforcement learning.
arXiv Detail & Related papers (2023-02-21T00:17:24Z) - Dynamic Pricing and Learning under the Bass Model [16.823029377470366]
We develop an algorithm that satisfies a high probability regret guarantee of order $tilde O(m2/3)$; where the market size $m$ is known a priori.
Unlike most regret analysis results, in the present problem the market size $m$ is the fundamental driver of the complexity.
arXiv Detail & Related papers (2021-03-09T03:27:33Z) - Adaptive Online Estimation of Piecewise Polynomial Trends [23.91519151164528]
We consider the framework of non-stationary optimization with squared error losses and noisy gradient feedback.
Motivated from the theory of non-parametric regression, we introduce a new variational constraint.
We show that the same policy is minimax optimal for several other non-parametric families of interest.
arXiv Detail & Related papers (2020-09-30T19:30:28Z) - Dynamic Regret of Convex and Smooth Functions [93.71361250701075]
We investigate online convex optimization in non-stationary environments.
We choose the dynamic regret as the performance measure.
We show that it is possible to further enhance the dynamic regret by exploiting the smoothness condition.
arXiv Detail & Related papers (2020-07-07T14:10:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.