Contextual Dynamic Pricing: Algorithms, Optimality, and Local Differential Privacy Constraints
- URL: http://arxiv.org/abs/2406.02424v1
- Date: Tue, 4 Jun 2024 15:44:10 GMT
- Title: Contextual Dynamic Pricing: Algorithms, Optimality, and Local Differential Privacy Constraints
- Authors: Zifeng Zhao, Feiyu Jiang, Yi Yu,
- Abstract summary: We study the contextual dynamic pricing problem where a firm sells products to $T$ sequentially arriving consumers.
We first show that the optimal regret upper bound is of order $sqrtdT$, up to a logarithmic factor.
A key insight of our theoretical result is an intrinsic connection between dynamic pricing and the contextual multi-armed bandit problem.
- Score: 10.057344315478709
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the contextual dynamic pricing problem where a firm sells products to $T$ sequentially arriving consumers that behave according to an unknown demand model. The firm aims to maximize its revenue, i.e. minimize its regret over a clairvoyant that knows the model in advance. The demand model is a generalized linear model (GLM), allowing for a stochastic feature vector in $\mathbb R^d$ that encodes product and consumer information. We first show that the optimal regret upper bound is of order $\sqrt{dT}$, up to a logarithmic factor, improving upon existing upper bounds in the literature by a $\sqrt{d}$ factor. This sharper rate is materialised by two algorithms: a confidence bound-type (supCB) algorithm and an explore-then-commit (ETC) algorithm. A key insight of our theoretical result is an intrinsic connection between dynamic pricing and the contextual multi-armed bandit problem with many arms based on a careful discretization. We further study contextual dynamic pricing under the local differential privacy (LDP) constraints. In particular, we propose a stochastic gradient descent based ETC algorithm that achieves an optimal regret upper bound of order $d\sqrt{T}/\epsilon$, up to a logarithmic factor, where $\epsilon>0$ is the privacy parameter. The regret upper bounds with and without LDP constraints are accompanied by newly constructed minimax lower bounds, which further characterize the cost of privacy. Extensive numerical experiments and a real data application on online lending are conducted to illustrate the efficiency and practical value of the proposed algorithms in dynamic pricing.
Related papers
- Minimax Optimality in Contextual Dynamic Pricing with General Valuation Models [4.156757591117864]
We propose a novel algorithm that achieves improved regret bounds while minimizing assumptions about the problem.
Our method extends beyond linear valuation models commonly used in dynamic pricing by considering general function spaces.
arXiv Detail & Related papers (2024-06-24T23:43:56Z) - FLIPHAT: Joint Differential Privacy for High Dimensional Sparse Linear Bandits [8.908421753758475]
High dimensional sparse linear bandits serve as an efficient model for sequential decision-making problems.
Motivated by data privacy concerns, we study the joint differentially private high dimensional sparse linear bandits.
We show that FLIPHAT achieves optimal regret in terms of privacy parameters.
arXiv Detail & Related papers (2024-05-22T22:19:12Z) - Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic
Shortest Path [80.60592344361073]
We study the Shortest Path (SSP) problem with a linear mixture transition kernel.
An agent repeatedly interacts with a environment and seeks to reach certain goal state while minimizing the cumulative cost.
Existing works often assume a strictly positive lower bound of the iteration cost function or an upper bound of the expected length for the optimal policy.
arXiv Detail & Related papers (2024-02-14T07:52:00Z) - Variance-Aware Regret Bounds for Stochastic Contextual Dueling Bandits [53.281230333364505]
This paper studies the problem of contextual dueling bandits, where the binary comparison of dueling arms is generated from a generalized linear model (GLM)
We propose a new SupLinUCB-type algorithm that enjoys computational efficiency and a variance-aware regret bound $tilde Obig(dsqrtsum_t=1Tsigma_t2 + dbig)$.
Our regret bound naturally aligns with the intuitive expectation in scenarios where the comparison is deterministic, the algorithm only suffers from an $tilde O(d)$ regret.
arXiv Detail & Related papers (2023-10-02T08:15:52Z) - Settling the Sample Complexity of Online Reinforcement Learning [92.02082223856479]
We show how to achieve minimax-optimal regret without incurring any burn-in cost.
We extend our theory to unveil the influences of problem-dependent quantities like the optimal value/cost and certain variances.
arXiv Detail & Related papers (2023-07-25T15:42:11Z) - Autoregressive Bandits [58.46584210388307]
We propose a novel online learning setting, Autoregressive Bandits, in which the observed reward is governed by an autoregressive process of order $k$.
We show that, under mild assumptions on the reward process, the optimal policy can be conveniently computed.
We then devise a new optimistic regret minimization algorithm, namely, AutoRegressive Upper Confidence Bound (AR-UCB), that suffers sublinear regret of order $widetildemathcalO left( frac(k+1)3/2sqrtnT (1-G
arXiv Detail & Related papers (2022-12-12T21:37:36Z) - Minimax Optimal Quantization of Linear Models: Information-Theoretic
Limits and Efficient Algorithms [59.724977092582535]
We consider the problem of quantizing a linear model learned from measurements.
We derive an information-theoretic lower bound for the minimax risk under this setting.
We show that our method and upper-bounds can be extended for two-layer ReLU neural networks.
arXiv Detail & Related papers (2022-02-23T02:39:04Z) - Online Allocation with Two-sided Resource Constraints [44.5635910908944]
We consider an online allocation problem subject to lower and upper resource constraints, where the requests arrive sequentially.
We propose a new algorithm that obtains $1-O(fracepsilonalpha-epsilon)$ -competitive ratio for the offline problems that know the entire requests ahead of time.
arXiv Detail & Related papers (2021-12-28T02:21:06Z) - On Dynamic Pricing with Covariates [6.6543199581017625]
We show that UCB and Thompson sampling-based pricing algorithms can achieve an $O(dsqrtTlog T)$ regret upper bound.
Our upper bound on the regret matches the lower bound up to logarithmic factors.
arXiv Detail & Related papers (2021-12-25T16:30:13Z) - Dynamic Pricing and Learning under the Bass Model [16.823029377470366]
We develop an algorithm that satisfies a high probability regret guarantee of order $tilde O(m2/3)$; where the market size $m$ is known a priori.
Unlike most regret analysis results, in the present problem the market size $m$ is the fundamental driver of the complexity.
arXiv Detail & Related papers (2021-03-09T03:27:33Z) - Nearly Dimension-Independent Sparse Linear Bandit over Small Action
Spaces via Best Subset Selection [71.9765117768556]
We consider the contextual bandit problem under the high dimensional linear model.
This setting finds essential applications such as personalized recommendation, online advertisement, and personalized medicine.
We propose doubly growing epochs and estimating the parameter using the best subset selection method.
arXiv Detail & Related papers (2020-09-04T04:10:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.