Taming Wild Price Fluctuations: Monotone Stochastic Convex Optimization
with Bandit Feedback
- URL: http://arxiv.org/abs/2103.09287v1
- Date: Tue, 16 Mar 2021 19:06:28 GMT
- Title: Taming Wild Price Fluctuations: Monotone Stochastic Convex Optimization
with Bandit Feedback
- Authors: Jad Salem, Swati Gupta, Vijay Kamble
- Abstract summary: Prices generated by automated price experimentation algorithms often display wild fluctuations.
We propose demand learning under a monotonicity constraint on the sequence of prices.
We show that our algorithms guarantee the same regret rates as the best achievable rates of regret without the monotonicity requirement.
- Score: 5.586191108738564
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Prices generated by automated price experimentation algorithms often display
wild fluctuations, leading to unfavorable customer perceptions and violations
of individual fairness: e.g., the price seen by a customer can be significantly
higher than what was seen by her predecessors, only to fall once again later.
To address this concern, we propose demand learning under a monotonicity
constraint on the sequence of prices, within the framework of stochastic convex
optimization with bandit feedback.
Our main contribution is the design of the first sublinear-regret algorithms
for monotonic price experimentation for smooth and strongly concave revenue
functions under noisy as well as noiseless bandit feedback. The monotonicity
constraint presents a unique challenge: since any increase (or decrease) in the
decision-levels is final, an algorithm needs to be cautious in its exploration
to avoid over-shooting the optimum. At the same time, minimizing regret
requires that progress be made towards the optimum at a sufficient pace.
Balancing these two goals is particularly challenging under noisy feedback,
where obtaining sufficiently accurate gradient estimates is expensive. Our key
innovation is to utilize conservative gradient estimates to adaptively tailor
the degree of caution to local gradient information, being aggressive far from
the optimum and being increasingly cautious as the prices approach the optimum.
Importantly, we show that our algorithms guarantee the same regret rates (up to
logarithmic factors) as the best achievable rates of regret without the
monotonicity requirement.
Related papers
- Single-loop Algorithms for Stochastic Non-convex Optimization with Weakly-Convex Constraints [49.76332265680669]
This paper examines a crucial subset of problems where both the objective and constraint functions are weakly convex.
Existing methods often face limitations, including slow convergence rates or reliance on double-loop designs.
We introduce a novel single-loop penalty-based algorithm to overcome these challenges.
arXiv Detail & Related papers (2025-04-21T17:15:48Z) - Pareto Optimal Algorithmic Recourse in Multi-cost Function [0.44938884406455726]
algorithmic recourse aims to identify minimal-cost actions to alter an individual features, thereby obtaining a desired outcome.
Most current recourse mechanisms use gradient-based methods that assume cost functions are differentiable, often not applicable in real-world scenarios.
This work proposes an algorithmic recourse framework that handles nondifferentiable and discrete multi-cost functions.
arXiv Detail & Related papers (2025-02-11T03:16:08Z) - Online POMDP Planning with Anytime Deterministic Optimality Guarantees [9.444784653236157]
We derive a deterministic relationship for discrete POMDPs between an approximated and the optimal solution.
We show that our derivations provide an avenue for a new set of algorithms and can be attached to existing algorithms.
arXiv Detail & Related papers (2023-10-03T04:40:38Z) - Optimizing Chance-Constrained Submodular Problems with Variable
Uncertainties [12.095075636344536]
We study chance-constrained submodular optimization problems, which capture a wide range of problems with constraints.
We present greedy algorithms that can obtain a high-quality solution, i.e., a constant approximation ratio to the given optimal solution.
arXiv Detail & Related papers (2023-09-23T04:48:49Z) - Adaptive Decision-Making with Constraints and Dependent Losses:
Performance Guarantees and Applications to Online and Nonlinear
Identification [5.787117733071415]
We consider adaptive decision-making problems where an agent optimize a cumulative performance objective by repeatedly choosing among a finite set of options.
Our algorithm and our analysis is instance dependent, that is, suboptimal choices of the environment are exploited and reflected in our regret bounds.
The performance of the resulting algorithms is highlighted in two numerical examples.
arXiv Detail & Related papers (2023-04-06T18:32:26Z) - Variance-Dependent Regret Bounds for Linear Bandits and Reinforcement
Learning: Adaptivity and Computational Efficiency [90.40062452292091]
We present the first computationally efficient algorithm for linear bandits with heteroscedastic noise.
Our algorithm is adaptive to the unknown variance of noise and achieves an $tildeO(d sqrtsum_k = 1K sigma_k2 + d)$ regret.
We also propose a variance-adaptive algorithm for linear mixture Markov decision processes (MDPs) in reinforcement learning.
arXiv Detail & Related papers (2023-02-21T00:17:24Z) - Contextual Decision-Making with Knapsacks Beyond the Worst Case [5.65888994172721]
We study the framework of a dynamic decision-making scenario with resource constraints.
In this framework, an agent selects an action in each round upon observing a random request.
We prove that our algorithm maintains a near-optimal $widetildeO(sqrtT)$ regret even in the worst cases.
arXiv Detail & Related papers (2022-11-25T08:21:50Z) - Learning to Optimize with Stochastic Dominance Constraints [103.26714928625582]
In this paper, we develop a simple yet efficient approach for the problem of comparing uncertain quantities.
We recast inner optimization in the Lagrangian as a learning problem for surrogate approximation, which bypasses apparent intractability.
The proposed light-SD demonstrates superior performance on several representative problems ranging from finance to supply chain management.
arXiv Detail & Related papers (2022-11-14T21:54:31Z) - A Unifying Framework for Online Optimization with Long-Term Constraints [62.35194099438855]
We study online learning problems in which a decision maker has to take a sequence of decisions subject to $m$ long-term constraints.
The goal is to maximize their total reward, while at the same time achieving small cumulative violation across the $T$ rounds.
We present the first best-of-both-world type algorithm for this general class problems, with no-regret guarantees both in the case in which rewards and constraints are selected according to an unknown model, and in the case in which they are selected at each round by an adversary.
arXiv Detail & Related papers (2022-09-15T16:59:19Z) - Off-Policy Evaluation with Policy-Dependent Optimization Response [90.28758112893054]
We develop a new framework for off-policy evaluation with a textitpolicy-dependent linear optimization response.
We construct unbiased estimators for the policy-dependent estimand by a perturbation method.
We provide a general algorithm for optimizing causal interventions.
arXiv Detail & Related papers (2022-02-25T20:25:37Z) - Safe Online Bid Optimization with Return-On-Investment and Budget
Constraints subject to Uncertainty [87.81197574939355]
We study the nature of both the optimization and learning problems.
We provide an algorithm, namely GCB, guaranteeing sublinear regret at the cost of a potentially linear number of constraints violations.
More interestingly, we provide an algorithm, namely GCB_safe(psi,phi), guaranteeing both sublinear pseudo-regret and safety w.h.p. at the cost of accepting tolerances psi and phi.
arXiv Detail & Related papers (2022-01-18T17:24:20Z) - Fair Incentives for Repeated Engagement [0.46040036610482665]
We study a problem of finding optimal monetary incentive schemes for retention when faced with agents whose participation decisions depend on the incentive they receive.
We show that even in the absence of explicit discrimination, policies may unintentionally discriminate between agents of different types by varying the type composition of the system.
arXiv Detail & Related papers (2021-10-28T04:13:53Z) - Concave Utility Reinforcement Learning with Zero-Constraint Violations [43.29210413964558]
We consider the problem of concave utility reinforcement learning (CURL) with convex constraints.
We propose a model-based learning algorithm that also achieves zero constraint violations.
arXiv Detail & Related papers (2021-09-12T06:13:33Z) - On the Optimality of Batch Policy Optimization Algorithms [106.89498352537682]
Batch policy optimization considers leveraging existing data for policy construction before interacting with an environment.
We show that any confidence-adjusted index algorithm is minimax optimal, whether it be optimistic, pessimistic or neutral.
We introduce a new weighted-minimax criterion that considers the inherent difficulty of optimal value prediction.
arXiv Detail & Related papers (2021-04-06T05:23:20Z) - Regularized Online Allocation Problems: Fairness and Beyond [7.433931244705934]
We introduce the emphregularized online allocation problem, a variant that includes a non-linear regularizer acting on the total resource consumption.
In this problem, requests repeatedly arrive over time and, for each request, a decision maker needs to take an action that generates a reward and consumes resources.
The objective is to simultaneously maximize additively separable rewards and the value of a non-separable regularizer subject to the resource constraints.
arXiv Detail & Related papers (2020-07-01T14:24:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.