Online Contextual Decision-Making with a Smart Predict-then-Optimize
Method
- URL: http://arxiv.org/abs/2206.07316v1
- Date: Wed, 15 Jun 2022 06:16:13 GMT
- Title: Online Contextual Decision-Making with a Smart Predict-then-Optimize
Method
- Authors: Heyuan Liu and Paul Grigas
- Abstract summary: We study an online contextual decision-making problem with resource constraints.
We propose an algorithm that mixes a prediction step based on the "Smart Predict-then- (SPO)" method with a dual update step based on mirror descent.
We prove regret bounds and demonstrate that the overall convergence rate of our method depends on the $mathcalO(T-1/2)$ convergence of online mirror descent.
- Score: 4.061135251278187
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study an online contextual decision-making problem with resource
constraints. At each time period, the decision-maker first predicts a reward
vector and resource consumption matrix based on a given context vector and then
solves a downstream optimization problem to make a decision. The final goal of
the decision-maker is to maximize the summation of the reward and the utility
from resource consumption, while satisfying the resource constraints. We
propose an algorithm that mixes a prediction step based on the "Smart
Predict-then-Optimize (SPO)" method with a dual update step based on mirror
descent. We prove regret bounds and demonstrate that the overall convergence
rate of our method depends on the $\mathcal{O}(T^{-1/2})$ convergence of online
mirror descent as well as risk bounds of the surrogate loss function used to
learn the prediction model. Our algorithm and regret bounds apply to a general
convex feasible region for the resource constraints, including both hard and
soft resource constraint cases, and they apply to a wide class of prediction
models in contrast to the traditional settings of linear contextual models or
finite policy spaces. We also conduct numerical experiments to empirically
demonstrate the strength of our proposed SPO-type methods, as compared to
traditional prediction-error-only methods, on multi-dimensional knapsack and
longest path instances.
Related papers
- Smart Predict-then-Optimize Method with Dependent Data: Risk Bounds and Calibration of Autoregression [7.369846475695131]
We present an autoregressive SPO method directly targeting the optimization problem at the decision stage.
We conduct experiments to demonstrate the effectiveness of the SPO+ surrogate compared to the absolute loss and the least squares loss.
arXiv Detail & Related papers (2024-11-19T17:02:04Z) - Contextual Linear Optimization with Bandit Feedback [35.692428244561626]
Contextual linear optimization (CLO) uses predictive contextual features to reduce uncertainty in random cost coefficients.
We study a class of offline learning algorithms for CLO with bandit feedback.
We show a fast-rate regret bound for IERM that allows for misspecified model classes and flexible choices of the optimization estimate.
arXiv Detail & Related papers (2024-05-26T13:27:27Z) - Learning Constrained Optimization with Deep Augmented Lagrangian Methods [54.22290715244502]
A machine learning (ML) model is trained to emulate a constrained optimization solver.
This paper proposes an alternative approach, in which the ML model is trained to predict dual solution estimates directly.
It enables an end-to-end training scheme is which the dual objective is as a loss function, and solution estimates toward primal feasibility, emulating a Dual Ascent method.
arXiv Detail & Related papers (2024-03-06T04:43:22Z) - Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization [59.758009422067]
We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning.
We propose a new uncertainty Bellman equation (UBE) whose solution converges to the true posterior variance over values.
We introduce a general-purpose policy optimization algorithm, Q-Uncertainty Soft Actor-Critic (QU-SAC) that can be applied for either risk-seeking or risk-averse policy optimization.
arXiv Detail & Related papers (2023-12-07T15:55:58Z) - A Note on Task-Aware Loss via Reweighing Prediction Loss by
Decision-Regret [11.57423546614283]
We propose a decision-aware version of predict-then-optimize.
We reweigh the prediction error by the decision regret incurred by an (unweighted) pilot estimator of costs.
We show that this approach can lead to improvements over a "predict-then-optimize" framework.
arXiv Detail & Related papers (2022-11-09T18:59:35Z) - Off-Policy Evaluation with Policy-Dependent Optimization Response [90.28758112893054]
We develop a new framework for off-policy evaluation with a textitpolicy-dependent linear optimization response.
We construct unbiased estimators for the policy-dependent estimand by a perturbation method.
We provide a general algorithm for optimizing causal interventions.
arXiv Detail & Related papers (2022-02-25T20:25:37Z) - Online Allocation with Two-sided Resource Constraints [44.5635910908944]
We consider an online allocation problem subject to lower and upper resource constraints, where the requests arrive sequentially.
We propose a new algorithm that obtains $1-O(fracepsilonalpha-epsilon)$ -competitive ratio for the offline problems that know the entire requests ahead of time.
arXiv Detail & Related papers (2021-12-28T02:21:06Z) - A Surrogate Objective Framework for Prediction+Optimization with Soft
Constraints [29.962390392493507]
Decision-focused prediction approaches, such as SPO+ and direct optimization, have been proposed to fill this gap.
This paper proposes a novel analytically differentiable surrogate objective framework for real-world linear and semi-definite negative quadratic programming problems.
arXiv Detail & Related papers (2021-11-22T17:09:57Z) - Learning MDPs from Features: Predict-Then-Optimize for Sequential
Decision Problems by Reinforcement Learning [52.74071439183113]
We study the predict-then-optimize framework in the context of sequential decision problems (formulated as MDPs) solved via reinforcement learning.
Two significant computational challenges arise in applying decision-focused learning to MDPs.
arXiv Detail & Related papers (2021-06-06T23:53:31Z) - Coordinated Online Learning for Multi-Agent Systems with Coupled
Constraints and Perturbed Utility Observations [91.02019381927236]
We introduce a novel method to steer the agents toward a stable population state, fulfilling the given resource constraints.
The proposed method is a decentralized resource pricing method based on the resource loads resulting from the augmentation of the game's Lagrangian.
arXiv Detail & Related papers (2020-10-21T10:11:17Z) - Upper Confidence Primal-Dual Reinforcement Learning for CMDP with
Adversarial Loss [145.54544979467872]
We consider online learning for episodically constrained Markov decision processes (CMDPs)
We propose a new emphupper confidence primal-dual algorithm, which only requires the trajectories sampled from the transition model.
Our analysis incorporates a new high-probability drift analysis of Lagrange multiplier processes into the celebrated regret analysis of upper confidence reinforcement learning.
arXiv Detail & Related papers (2020-03-02T05:02:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.