Dynamic Pricing and Learning under the Bass Model
- URL: http://arxiv.org/abs/2103.05199v1
- Date: Tue, 9 Mar 2021 03:27:33 GMT
- Title: Dynamic Pricing and Learning under the Bass Model
- Authors: Shipra Agrawal, Steven Yin, Assaf Zeevi
- Abstract summary: We develop an algorithm that satisfies a high probability regret guarantee of order $tilde O(m2/3)$; where the market size $m$ is known a priori.
Unlike most regret analysis results, in the present problem the market size $m$ is the fundamental driver of the complexity.
- Score: 16.823029377470366
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We consider a novel formulation of the dynamic pricing and demand learning
problem, where the evolution of demand in response to posted prices is governed
by a stochastic variant of the popular Bass model with parameters $\alpha,
\beta$ that are linked to the so-called "innovation" and "imitation" effects.
Unlike the more commonly used i.i.d. and contextual demand models, in this
model the posted price not only affects the demand and the revenue in the
current round but also the future evolution of demand, and hence the fraction
of potential market size $m$ that can be ultimately captured. In this paper, we
consider the more challenging incomplete information problem where dynamic
pricing is applied in conjunction with learning the unknown parameters, with
the objective of optimizing the cumulative revenues over a given selling
horizon of length $T$. Equivalently, the goal is to minimize the regret which
measures the revenue loss of the algorithm relative to the optimal expected
revenue achievable under the stochastic Bass model with market size $m$ and
time horizon $T$. Our main contribution is the development of an algorithm that
satisfies a high probability regret guarantee of order $\tilde O(m^{2/3})$;
where the market size $m$ is known a priori. Moreover, we show that no
algorithm can incur smaller order of loss by deriving a matching lower bound.
Unlike most regret analysis results, in the present problem the market size $m$
is the fundamental driver of the complexity; our lower bound in fact, indicates
that for any fixed $\alpha, \beta$, most non-trivial instances of the problem
have constant $T$ and large $m$. We believe that this insight sets the problem
of dynamic pricing under the Bass model apart from the typical i.i.d. setting
and multi-armed bandit based models for dynamic pricing, which typically focus
only on the asymptotics with respect to time horizon $T$.
Related papers
- Minimax Optimality in Contextual Dynamic Pricing with General Valuation Models [4.156757591117864]
We propose a novel algorithm that achieves improved regret bounds while minimizing assumptions about the problem.
Our method extends beyond linear valuation models commonly used in dynamic pricing by considering general function spaces.
arXiv Detail & Related papers (2024-06-24T23:43:56Z) - Contextual Dynamic Pricing: Algorithms, Optimality, and Local Differential Privacy Constraints [10.057344315478709]
We study the contextual dynamic pricing problem where a firm sells products to $T$ sequentially arriving consumers.
We first show that the optimal regret upper bound is of order $sqrtdT$, up to a logarithmic factor.
A key insight of our theoretical result is an intrinsic connection between dynamic pricing and the contextual multi-armed bandit problem.
arXiv Detail & Related papers (2024-06-04T15:44:10Z) - Dynamic Pricing and Learning with Long-term Reference Effects [16.07344044662994]
We study a simple and novel reference price mechanism where reference price is the average of the past prices offered by the seller.
We show that under this mechanism, a markdown policy is near-optimal irrespective of the parameters of the model.
We then consider a more challenging dynamic pricing and learning problem, where the demand model parameters are apriori unknown.
arXiv Detail & Related papers (2024-02-19T21:36:54Z) - Smoothness-Adaptive Dynamic Pricing with Nonparametric Demand Learning [0.0]
We study the dynamic pricing problem where the demand function is nonparametric and H"older smooth.
We focus on adaptivity to the unknown H"older smoothness parameter $beta$ of the demand function.
arXiv Detail & Related papers (2023-10-11T15:02:13Z) - Settling the Sample Complexity of Online Reinforcement Learning [92.02082223856479]
We show how to achieve minimax-optimal regret without incurring any burn-in cost.
We extend our theory to unveil the influences of problem-dependent quantities like the optimal value/cost and certain variances.
arXiv Detail & Related papers (2023-07-25T15:42:11Z) - Autoregressive Bandits [58.46584210388307]
We propose a novel online learning setting, Autoregressive Bandits, in which the observed reward is governed by an autoregressive process of order $k$.
We show that, under mild assumptions on the reward process, the optimal policy can be conveniently computed.
We then devise a new optimistic regret minimization algorithm, namely, AutoRegressive Upper Confidence Bound (AR-UCB), that suffers sublinear regret of order $widetildemathcalO left( frac(k+1)3/2sqrtnT (1-G
arXiv Detail & Related papers (2022-12-12T21:37:36Z) - Quantum computational finance: martingale asset pricing for incomplete
markets [69.73491758935712]
We show that a variety of quantum techniques can be applied to the pricing problem in finance.
We discuss three different methods that are distinct from previous works.
arXiv Detail & Related papers (2022-09-19T09:22:01Z) - Causal Bandits for Linear Structural Equation Models [58.2875460517691]
This paper studies the problem of designing an optimal sequence of interventions in a causal graphical model.
It is assumed that the graph's structure is known and has $N$ nodes.
Two algorithms are proposed for the frequentist (UCB-based) and Bayesian settings.
arXiv Detail & Related papers (2022-08-26T16:21:31Z) - Policy Optimization Using Semiparametric Models for Dynamic Pricing [1.3428344011390776]
We study the contextual dynamic pricing problem where the market value of a product is linear in its observed features plus some market noise.
We propose a dynamic statistical learning and decision-making policy that combines semiparametric estimation from a generalized linear model with an unknown link and online decision-making.
arXiv Detail & Related papers (2021-09-13T23:50:01Z) - Exploring Sparse Expert Models and Beyond [51.90860155810848]
Mixture-of-Experts (MoE) models can achieve promising results with outrageous large amount of parameters but constant computation cost.
We propose a simple method called expert prototyping that splits experts into different prototypes and applies $k$ top-$1$ routing.
This strategy improves the model quality but maintains constant computational costs, and our further exploration on extremely large-scale models reflects that it is more effective in training larger models.
arXiv Detail & Related papers (2021-05-31T16:12:44Z) - Adaptive Control and Regret Minimization in Linear Quadratic Gaussian
(LQG) Setting [91.43582419264763]
We propose LqgOpt, a novel reinforcement learning algorithm based on the principle of optimism in the face of uncertainty.
LqgOpt efficiently explores the system dynamics, estimates the model parameters up to their confidence interval, and deploys the controller of the most optimistic model.
arXiv Detail & Related papers (2020-03-12T19:56:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.