Constrained Online Two-stage Stochastic Optimization: Algorithm with
(and without) Predictions
- URL: http://arxiv.org/abs/2401.01077v1
- Date: Tue, 2 Jan 2024 07:46:33 GMT
- Title: Constrained Online Two-stage Stochastic Optimization: Algorithm with
(and without) Predictions
- Authors: Piao Hu, Jiashuo Jiang, Guodong Lyu, Hao Su
- Abstract summary: We consider an online two-stage optimization with long-term constraints over a finite horizon of $T$ periods.
We develop online algorithms for the online two-stage problem from adversarial learning algorithms.
- Score: 19.537289123577022
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We consider an online two-stage stochastic optimization with long-term
constraints over a finite horizon of $T$ periods. At each period, we take the
first-stage action, observe a model parameter realization and then take the
second-stage action from a feasible set that depends both on the first-stage
decision and the model parameter. We aim to minimize the cumulative objective
value while guaranteeing that the long-term average second-stage decision
belongs to a set. We develop online algorithms for the online two-stage problem
from adversarial learning algorithms. Also, the regret bound of our algorithm
can be reduced to the regret bound of embedded adversarial learning algorithms.
Based on this framework, we obtain new results under various settings. When the
model parameters are drawn from unknown non-stationary distributions and we are
given machine-learned predictions of the distributions, we develop a new
algorithm from our framework with a regret $O(W_T+\sqrt{T})$, where $W_T$
measures the total inaccuracy of the machine-learned predictions. We then
develop another algorithm that works when no machine-learned predictions are
given and show the performances.
Related papers
- Variance-Dependent Regret Bounds for Linear Bandits and Reinforcement
Learning: Adaptivity and Computational Efficiency [90.40062452292091]
We present the first computationally efficient algorithm for linear bandits with heteroscedastic noise.
Our algorithm is adaptive to the unknown variance of noise and achieves an $tildeO(d sqrtsum_k = 1K sigma_k2 + d)$ regret.
We also propose a variance-adaptive algorithm for linear mixture Markov decision processes (MDPs) in reinforcement learning.
arXiv Detail & Related papers (2023-02-21T00:17:24Z) - Constrained Online Two-stage Stochastic Optimization: Near Optimal Algorithms via Adversarial Learning [1.994307489466967]
We consider an online two-stage optimization with long-term constraints over a finite horizon of $T$ periods.
We develop online algorithms for the online two-stage problem from adversarial learning algorithms.
arXiv Detail & Related papers (2023-02-02T10:33:09Z) - Sharp Variance-Dependent Bounds in Reinforcement Learning: Best of Both
Worlds in Stochastic and Deterministic Environments [48.96971760679639]
We study variance-dependent regret bounds for Markov decision processes (MDPs)
We propose two new environment norms to characterize the fine-grained variance properties of the environment.
For model-based methods, we design a variant of the MVP algorithm.
In particular, this bound is simultaneously minimax optimal for both and deterministic MDPs.
arXiv Detail & Related papers (2023-01-31T06:54:06Z) - Non-Clairvoyant Scheduling with Predictions Revisited [77.86290991564829]
In non-clairvoyant scheduling, the task is to find an online strategy for scheduling jobs with a priori unknown processing requirements.
We revisit this well-studied problem in a recently popular learning-augmented setting that integrates (untrusted) predictions in algorithm design.
We show that these predictions have desired properties, admit a natural error measure as well as algorithms with strong performance guarantees.
arXiv Detail & Related papers (2022-02-21T13:18:11Z) - Online estimation and control with optimal pathlength regret [52.28457815067461]
A natural goal when designing online learning algorithms is to bound the regret of the algorithm in terms of the temporal variation of the input sequence.
Data-dependent "pathlength" regret bounds have recently been obtained for a wide variety of online learning problems, including OCO and bandits.
arXiv Detail & Related papers (2021-10-24T22:43:15Z) - Implicit Finite-Horizon Approximation and Efficient Optimal Algorithms
for Stochastic Shortest Path [29.289190242826688]
We introduce a generic template for developing regret algorithms in the Shortest Path (SSP) model.
We develop two new algorithms: the first is model-free and minimax optimal under strictly positive costs.
Both algorithms admit highly sparse updates, making them computationally more efficient than all existing algorithms.
arXiv Detail & Related papers (2021-06-15T19:15:17Z) - Byzantine-Resilient Non-Convex Stochastic Gradient Descent [61.6382287971982]
adversary-resilient distributed optimization, in which.
machines can independently compute gradients, and cooperate.
Our algorithm is based on a new concentration technique, and its sample complexity.
It is very practical: it improves upon the performance of all prior methods when no.
setting machines are present.
arXiv Detail & Related papers (2020-12-28T17:19:32Z) - Bayes DistNet -- A Robust Neural Network for Algorithm Runtime
Distribution Predictions [1.8275108630751844]
Randomized algorithms are used in many state-of-the-art solvers for constraint satisfaction problems (CSP) and Boolean satisfiability (SAT) problems.
Previous state-of-the-art methods directly try to predict a fixed parametric distribution that the input instance follows.
This new model achieves robust predictive performance in the low observation setting, as well as handling censored observations.
arXiv Detail & Related papers (2020-12-14T01:15:39Z) - Online Stochastic Optimization with Wasserstein Based Non-stationarity [12.91020811577007]
We consider a general online optimization problem with multiple budget constraints over a horizon of finite time periods.
The objective of the decision maker is to maximize the cumulative reward subject to the budget constraints.
This formulation captures a wide range of applications including online linear programming and network revenue management.
arXiv Detail & Related papers (2020-12-13T04:47:37Z) - Large-Scale Methods for Distributionally Robust Optimization [53.98643772533416]
We prove that our algorithms require a number of evaluations gradient independent of training set size and number of parameters.
Experiments on MNIST and ImageNet confirm the theoretical scaling of our algorithms, which are 9--36 times more efficient than full-batch methods.
arXiv Detail & Related papers (2020-10-12T17:41:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.