Towards Agnostic Feature-based Dynamic Pricing: Linear Policies vs
Linear Valuation with Unknown Noise
- URL: http://arxiv.org/abs/2201.11341v1
- Date: Thu, 27 Jan 2022 06:40:03 GMT
- Title: Towards Agnostic Feature-based Dynamic Pricing: Linear Policies vs
Linear Valuation with Unknown Noise
- Authors: Jianyu Xu and Yu-Xiang Wang
- Abstract summary: We show an algorithm that achieves an $tildeO(Tfrac34)$ regret, and improve the best-known lower bound from $Omega(Tfrac35)$ to $tildeOmega(Tfrac23)$.
Results demonstrate that no-regret learning is possible for feature-based dynamic pricing under weak assumptions.
- Score: 16.871660060209674
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In feature-based dynamic pricing, a seller sets appropriate prices for a
sequence of products (described by feature vectors) on the fly by learning from
the binary outcomes of previous sales sessions ("Sold" if valuation $\geq$
price, and "Not Sold" otherwise). Existing works either assume noiseless linear
valuation or precisely-known noise distribution, which limits the applicability
of those algorithms in practice when these assumptions are hard to verify. In
this work, we study two more agnostic models: (a) a "linear policy" problem
where we aim at competing with the best linear pricing policy while making no
assumptions on the data, and (b) a "linear noisy valuation" problem where the
random valuation is linear plus an unknown and assumption-free noise. For the
former model, we show a $\tilde{\Theta}(d^{\frac13}T^{\frac23})$ minimax regret
up to logarithmic factors. For the latter model, we present an algorithm that
achieves an $\tilde{O}(T^{\frac34})$ regret, and improve the best-known lower
bound from $\Omega(T^{\frac35})$ to $\tilde{\Omega}(T^{\frac23})$. These
results demonstrate that no-regret learning is possible for feature-based
dynamic pricing under weak assumptions, but also reveal a disappointing fact
that the seemingly richer pricing feedback is not significantly more useful
than the bandit-feedback in regret reduction.
Related papers
- Minimax Optimality in Contextual Dynamic Pricing with General Valuation Models [4.156757591117864]
We propose a novel algorithm that achieves improved regret bounds while minimizing assumptions about the problem.
Our method extends beyond linear valuation models commonly used in dynamic pricing by considering general function spaces.
arXiv Detail & Related papers (2024-06-24T23:43:56Z) - Improved Algorithms for Contextual Dynamic Pricing [24.530341596901476]
In contextual dynamic pricing, a seller sequentially prices goods based on contextual information.
Our algorithm achieves an optimal regret bound of $tildemathcalO(T2/3)$, improving the existing results.
For this model, our algorithm obtains a regret $tildemathcalO(Td+2beta/d+3beta)$, where $d$ is the dimension of the context space.
arXiv Detail & Related papers (2024-06-17T08:26:51Z) - Contextual Dynamic Pricing: Algorithms, Optimality, and Local Differential Privacy Constraints [10.057344315478709]
We study the contextual dynamic pricing problem where a firm sells products to $T$ sequentially arriving consumers.
We first show that the optimal regret upper bound is of order $sqrtdT$, up to a logarithmic factor.
A key insight of our theoretical result is an intrinsic connection between dynamic pricing and the contextual multi-armed bandit problem.
arXiv Detail & Related papers (2024-06-04T15:44:10Z) - High-probability Convergence Bounds for Nonlinear Stochastic Gradient Descent Under Heavy-tailed Noise [59.25598762373543]
We show that wetailed high-prob convergence guarantees of learning on streaming data in the presence of heavy-tailed noise.
We demonstrate analytically and that $ta$ can be used to the preferred choice of setting for a given problem.
arXiv Detail & Related papers (2023-10-28T18:53:41Z) - Dynamic Pricing and Learning with Bayesian Persuasion [18.59029578133633]
We consider a novel dynamic pricing and learning setting where in addition to setting prices of products, the seller also ex-ante commits to 'advertising schemes'
We use the popular Bayesian persuasion framework to model the effect of these signals on the buyers' valuation and purchase responses.
We design an online algorithm that can use past purchase responses to adaptively learn the optimal pricing and advertising strategy.
arXiv Detail & Related papers (2023-04-27T17:52:06Z) - Borda Regret Minimization for Generalized Linear Dueling Bandits [65.09919504862496]
We study the Borda regret minimization problem for dueling bandits, which aims to identify the item with the highest Borda score.
We propose a rich class of generalized linear dueling bandit models, which cover many existing models.
Our algorithm achieves an $tildeO(d2/3 T2/3)$ regret, which is also optimal.
arXiv Detail & Related papers (2023-03-15T17:59:27Z) - A Reinforcement Learning Approach in Multi-Phase Second-Price Auction
Design [158.0041488194202]
We study reserve price optimization in multi-phase second price auctions.
From the seller's perspective, we need to efficiently explore the environment in the presence of potentially nontruthful bidders.
Third, the seller's per-step revenue is unknown, nonlinear, and cannot even be directly observed from the environment.
arXiv Detail & Related papers (2022-10-19T03:49:05Z) - Optimal Online Generalized Linear Regression with Stochastic Noise and
Its Application to Heteroscedastic Bandits [88.6139446295537]
We study the problem of online generalized linear regression in the setting of a generalized linear model with possibly unbounded additive noise.
We provide a sharp analysis of the classical follow-the-regularized-leader (FTRL) algorithm to cope with the label noise.
We propose an algorithm based on FTRL to achieve the first variance-aware regret bound.
arXiv Detail & Related papers (2022-02-28T08:25:26Z) - Logarithmic Regret in Feature-based Dynamic Pricing [0.0]
Feature-based dynamic pricing is an increasingly popular model of setting prices for highly differentiated products.
We provide two algorithms for infracigen and adversarial feature settings, and prove the optimal $O(dlogT)$ regret bounds for both.
We also prove an $(sqrtT)$ information-theoretic lower bound for a slightly more general setting, which demonstrates that "knowing-the-demand-curve" leads to an exponential improvement in feature-based dynamic pricing.
arXiv Detail & Related papers (2021-02-20T00:45:33Z) - Stochastic Linear Bandits Robust to Adversarial Attacks [117.665995707568]
We provide two variants of a Robust Phased Elimination algorithm, one that knows $C$ and one that does not.
We show that both variants attain near-optimal regret in the non-corrupted case $C = 0$, while incurring additional additive terms respectively.
In a contextual setting, we show that a simple greedy algorithm is provably robust with a near-optimal additive regret term, despite performing no explicit exploration and not knowing $C$.
arXiv Detail & Related papers (2020-07-07T09:00:57Z) - Contextual Linear Bandits under Noisy Features: Towards Bayesian Oracles [65.9694455739978]
We study contextual linear bandit problems under feature uncertainty, where the features are noisy and have missing entries.
Our analysis reveals that the optimal hypothesis can significantly deviate from the underlying realizability function, depending on the noise characteristics.
This implies that classical approaches cannot guarantee a non-trivial regret bound.
arXiv Detail & Related papers (2017-03-03T21:39:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.