Related papers: Improved Algorithms for Contextual Dynamic Pricing

Improved Algorithms for Contextual Dynamic Pricing

URL: http://arxiv.org/abs/2406.11316v1
Date: Mon, 17 Jun 2024 08:26:51 GMT
Title: Improved Algorithms for Contextual Dynamic Pricing
Authors: Matilde Tullii, Solenne Gaucher, Nadav Merlis, Vianney Perchet,
Abstract summary: In contextual dynamic pricing, a seller sequentially prices goods based on contextual information. Our algorithm achieves an optimal regret bound of $tildemathcalO(T2/3)$, improving the existing results. For this model, our algorithm obtains a regret $tildemathcalO(Td+2beta/d+3beta)$, where $d$ is the dimension of the context space.
Score: 24.530341596901476
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In contextual dynamic pricing, a seller sequentially prices goods based on contextual information. Buyers will purchase products only if the prices are below their valuations. The goal of the seller is to design a pricing strategy that collects as much revenue as possible. We focus on two different valuation models. The first assumes that valuations linearly depend on the context and are further distorted by noise. Under minor regularity assumptions, our algorithm achieves an optimal regret bound of $\tilde{\mathcal{O}}(T^{2/3})$, improving the existing results. The second model removes the linearity assumption, requiring only that the expected buyer valuation is $\beta$-H\"older in the context. For this model, our algorithm obtains a regret $\tilde{\mathcal{O}}(T^{d+2\beta/d+3\beta})$, where $d$ is the dimension of the context space.

Related papers

Dynamic Assortment Selection and Pricing with Censored Preference Feedback [10.988222071035198]
We introduce a novel framework based on a textitcensored multinomial logit (C-MNL) choice model. Sellers present a set of products with prices, and buyers filter out products priced above their valuation, purchasing at most one product from the remaining options based on their preferences. Our algorithms achieve regret bounds of $tildeO(dfrac32sqrtT/kappa)$ and $tildeO(d2sqrtT/kappa)
arXiv Detail & Related papers (2025-04-03T06:56:08Z)
Near-optimal Regret Using Policy Optimization in Online MDPs with Aggregate Bandit Feedback [49.84060509296641]
We study online finite-horizon Markov Decision Processes with adversarially changing loss and aggregate bandit feedback (a.k.a full-bandit) Under this type of feedback, the agent observes only the total loss incurred over the entire trajectory, rather than the individual losses at each intermediate step within the trajectory. We introduce the first Policy Optimization algorithms for this setting.
arXiv Detail & Related papers (2025-02-06T12:03:24Z)
Minimax Optimality in Contextual Dynamic Pricing with General Valuation Models [4.156757591117864]
We propose a novel algorithm that achieves improved regret bounds while minimizing assumptions about the problem. Our method extends beyond linear valuation models commonly used in dynamic pricing by considering general function spaces.
arXiv Detail & Related papers (2024-06-24T23:43:56Z)
Mind the Gap: A Causal Perspective on Bias Amplification in Prediction & Decision-Making [58.06306331390586]
We introduce the notion of a margin complement, which measures how much a prediction score $S$ changes due to a thresholding operation. We show that under suitable causal assumptions, the influences of $X$ on the prediction score $S$ are equal to the influences of $X$ on the true outcome $Y$.
arXiv Detail & Related papers (2024-05-24T11:22:19Z)
Efficient Frameworks for Generalized Low-Rank Matrix Bandit Problems [61.85150061213987]
We study the generalized low-rank matrix bandit problem, proposed in citelu2021low under the Generalized Linear Model (GLM) framework. To overcome the computational infeasibility and theoretical restrain of existing algorithms, we first propose the G-ESTT framework. We show that G-ESTT can achieve the $tildeO(sqrt(d_1+d_2)3/2Mr3/2T)$ bound of regret while G-ESTS can achineve the $tildeO
arXiv Detail & Related papers (2024-01-14T14:14:19Z)
Dynamic Pricing and Learning with Bayesian Persuasion [18.59029578133633]
We consider a novel dynamic pricing and learning setting where in addition to setting prices of products, the seller also ex-ante commits to 'advertising schemes' We use the popular Bayesian persuasion framework to model the effect of these signals on the buyers' valuation and purchase responses. We design an online algorithm that can use past purchase responses to adaptively learn the optimal pricing and advertising strategy.
arXiv Detail & Related papers (2023-04-27T17:52:06Z)
Borda Regret Minimization for Generalized Linear Dueling Bandits [65.09919504862496]
We study the Borda regret minimization problem for dueling bandits, which aims to identify the item with the highest Borda score. We propose a rich class of generalized linear dueling bandit models, which cover many existing models. Our algorithm achieves an $tildeO(d2/3 T2/3)$ regret, which is also optimal.
arXiv Detail & Related papers (2023-03-15T17:59:27Z)
Autoregressive Bandits [58.46584210388307]
We propose a novel online learning setting, Autoregressive Bandits, in which the observed reward is governed by an autoregressive process of order $k$. We show that, under mild assumptions on the reward process, the optimal policy can be conveniently computed. We then devise a new optimistic regret minimization algorithm, namely, AutoRegressive Upper Confidence Bound (AR-UCB), that suffers sublinear regret of order $widetildemathcalO left( frac(k+1)3/2sqrtnT (1-G
arXiv Detail & Related papers (2022-12-12T21:37:36Z)
A Reinforcement Learning Approach in Multi-Phase Second-Price Auction Design [158.0041488194202]
We study reserve price optimization in multi-phase second price auctions. From the seller's perspective, we need to efficiently explore the environment in the presence of potentially nontruthful bidders. Third, the seller's per-step revenue is unknown, nonlinear, and cannot even be directly observed from the environment.
arXiv Detail & Related papers (2022-10-19T03:49:05Z)
Towards Agnostic Feature-based Dynamic Pricing: Linear Policies vs Linear Valuation with Unknown Noise [16.871660060209674]
We show an algorithm that achieves an $tildeO(Tfrac34)$ regret, and improve the best-known lower bound from $Omega(Tfrac35)$ to $tildeOmega(Tfrac23)$. Results demonstrate that no-regret learning is possible for feature-based dynamic pricing under weak assumptions.
arXiv Detail & Related papers (2022-01-27T06:40:03Z)
Dynamic Pricing and Learning under the Bass Model [16.823029377470366]
We develop an algorithm that satisfies a high probability regret guarantee of order $tilde O(m2/3)$; where the market size $m$ is known a priori. Unlike most regret analysis results, in the present problem the market size $m$ is the fundamental driver of the complexity.
arXiv Detail & Related papers (2021-03-09T03:27:33Z)
Logarithmic Regret in Feature-based Dynamic Pricing [0.0]
Feature-based dynamic pricing is an increasingly popular model of setting prices for highly differentiated products. We provide two algorithms for infracigen and adversarial feature settings, and prove the optimal $O(dlogT)$ regret bounds for both. We also prove an $(sqrtT)$ information-theoretic lower bound for a slightly more general setting, which demonstrates that "knowing-the-demand-curve" leads to an exponential improvement in feature-based dynamic pricing.
arXiv Detail & Related papers (2021-02-20T00:45:33Z)
Revisiting Smoothed Online Learning [70.09792747315323]
We investigate the problem of smoothed online learning, in which the online learner suffers both a hitting cost and a switching cost. To bound the competitive ratio, we assume the hitting cost is known to the learner in each round, and investigate the greedy algorithm which simply minimizes the weighted sum of the hitting cost and the switching cost.
arXiv Detail & Related papers (2021-02-13T14:15:55Z)
A new regret analysis for Adam-type algorithms [78.825194932103]
In theory, regret guarantees for online convex optimization require a rapidly decaying $beta_1to0$ schedule. We propose a novel framework that allows us to derive optimal, data-dependent regret bounds with a constant $beta_1$.
arXiv Detail & Related papers (2020-03-21T19:19:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.