Taming Wild Price Fluctuations: Monotone Stochastic Convex Optimization
with Bandit Feedback
- URL: http://arxiv.org/abs/2103.09287v1
- Date: Tue, 16 Mar 2021 19:06:28 GMT
- Title: Taming Wild Price Fluctuations: Monotone Stochastic Convex Optimization
with Bandit Feedback
- Authors: Jad Salem, Swati Gupta, Vijay Kamble
- Abstract summary: Prices generated by automated price experimentation algorithms often display wild fluctuations.
We propose demand learning under a monotonicity constraint on the sequence of prices.
We show that our algorithms guarantee the same regret rates as the best achievable rates of regret without the monotonicity requirement.
- Score: 5.586191108738564
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Prices generated by automated price experimentation algorithms often display
wild fluctuations, leading to unfavorable customer perceptions and violations
of individual fairness: e.g., the price seen by a customer can be significantly
higher than what was seen by her predecessors, only to fall once again later.
To address this concern, we propose demand learning under a monotonicity
constraint on the sequence of prices, within the framework of stochastic convex
optimization with bandit feedback.
Our main contribution is the design of the first sublinear-regret algorithms
for monotonic price experimentation for smooth and strongly concave revenue
functions under noisy as well as noiseless bandit feedback. The monotonicity
constraint presents a unique challenge: since any increase (or decrease) in the
decision-levels is final, an algorithm needs to be cautious in its exploration
to avoid over-shooting the optimum. At the same time, minimizing regret
requires that progress be made towards the optimum at a sufficient pace.
Balancing these two goals is particularly challenging under noisy feedback,
where obtaining sufficiently accurate gradient estimates is expensive. Our key
innovation is to utilize conservative gradient estimates to adaptively tailor
the degree of caution to local gradient information, being aggressive far from
the optimum and being increasingly cautious as the prices approach the optimum.
Importantly, we show that our algorithms guarantee the same regret rates (up to
logarithmic factors) as the best achievable rates of regret without the
monotonicity requirement.
Related papers
- Stochastic Zeroth-Order Optimization under Strongly Convexity and Lipschitz Hessian: Minimax Sample Complexity [59.75300530380427]
We consider the problem of optimizing second-order smooth and strongly convex functions where the algorithm is only accessible to noisy evaluations of the objective function it queries.
We provide the first tight characterization for the rate of the minimax simple regret by developing matching upper and lower bounds.
arXiv Detail & Related papers (2024-06-28T02:56:22Z) - Minimax Optimality in Contextual Dynamic Pricing with General Valuation Models [4.156757591117864]
We propose a novel algorithm that achieves improved regret bounds while minimizing assumptions about the problem.
Our method extends beyond linear valuation models commonly used in dynamic pricing by considering general function spaces.
arXiv Detail & Related papers (2024-06-24T23:43:56Z) - Adaptive, Doubly Optimal No-Regret Learning in Strongly Monotone and Exp-Concave Games with Gradient Feedback [75.29048190099523]
Online gradient descent (OGD) is well known to be doubly optimal under strong convexity or monotonicity assumptions.
In this paper, we design a fully adaptive OGD algorithm, textsfAdaOGD, that does not require a priori knowledge of these parameters.
arXiv Detail & Related papers (2023-10-21T18:38:13Z) - Rate-Optimal Policy Optimization for Linear Markov Decision Processes [65.5958446762678]
We obtain rate-optimal $widetilde O (sqrt K)$ regret where $K$ denotes the number of episodes.
Our work is the first to establish the optimal (w.r.t.$K$) rate of convergence in the setting with bandit feedback.
No algorithm with an optimal rate guarantee is currently known.
arXiv Detail & Related papers (2023-08-28T15:16:09Z) - Restarts subject to approximate sharpness: A parameter-free and optimal scheme for first-order methods [0.6554326244334866]
Sharpness is an assumption in continuous optimization that bounds the distance from minima by objective function suboptimality.
Sharpness involves problem-specific constants that are typically unknown, and restart schemes typically reduce convergence rates.
We consider the assumption of approximate sharpness, a generalization of sharpness that incorporates an unknown constant perturbation to the objective function error.
arXiv Detail & Related papers (2023-01-05T19:01:41Z) - Autoregressive Bandits [58.46584210388307]
We propose a novel online learning setting, Autoregressive Bandits, in which the observed reward is governed by an autoregressive process of order $k$.
We show that, under mild assumptions on the reward process, the optimal policy can be conveniently computed.
We then devise a new optimistic regret minimization algorithm, namely, AutoRegressive Upper Confidence Bound (AR-UCB), that suffers sublinear regret of order $widetildemathcalO left( frac(k+1)3/2sqrtnT (1-G
arXiv Detail & Related papers (2022-12-12T21:37:36Z) - Smoothed Online Convex Optimization Based on Discounted-Normal-Predictor [68.17855675511602]
We investigate an online prediction strategy named as Discounted-Normal-Predictor (Kapralov and Panigrahy, 2010) for smoothed online convex optimization (SOCO)
We show that the proposed algorithm can minimize the adaptive regret with switching cost in every interval.
arXiv Detail & Related papers (2022-05-02T08:48:22Z) - High-probability Bounds for Non-Convex Stochastic Optimization with
Heavy Tails [55.561406656549686]
We consider non- Hilbert optimization using first-order algorithms for which the gradient estimates may have tails.
We show that a combination of gradient, momentum, and normalized gradient descent convergence to critical points in high-probability with best-known iteration for smooth losses.
arXiv Detail & Related papers (2021-06-28T00:17:01Z) - Reward Biased Maximum Likelihood Estimation for Reinforcement Learning [13.820705458648233]
Reward-Biased Maximum Likelihood Estimate (RBMLE) for adaptive control of Markov chains was proposed.
We show that it has a regret of $mathcalO( log T)$ over a time horizon of $T$ steps, similar to state-of-the-art algorithms.
arXiv Detail & Related papers (2020-11-16T06:09:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.