Optimistic Online Non-stochastic Control via FTRL
- URL: http://arxiv.org/abs/2404.03309v1
- Date: Thu, 4 Apr 2024 09:08:04 GMT
- Title: Optimistic Online Non-stochastic Control via FTRL
- Authors: Naram Mhaisen, George Iosifidis,
- Abstract summary: This paper brings the concept of "optimism" to the new and promising framework of online Non-stochastic Control (NSC)
Namely, we study how can NSC benefit from a prediction oracle of unknown quality responsible for forecasting future costs.
New bounds are commensurate with the oracle's accuracy, ranging from perfect predictions to the order-optimal $mathcalO(sqrtT)$ even when all predictions fail.
- Score: 10.25772015681554
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper brings the concept of "optimism" to the new and promising framework of online Non-stochastic Control (NSC). Namely, we study how can NSC benefit from a prediction oracle of unknown quality responsible for forecasting future costs. The posed problem is first reduced to an optimistic learning with delayed feedback problem, which is handled through the Optimistic Follow the Regularized Leader (OFTRL) algorithmic family. This reduction enables the design of OptFTRL-C, the first Disturbance Action Controller (DAC) with optimistic policy regret bounds. These new bounds are commensurate with the oracle's accuracy, ranging from $\mathcal{O}(1)$ for perfect predictions to the order-optimal $\mathcal{O}(\sqrt{T})$ even when all predictions fail. By addressing the challenge of incorporating untrusted predictions into control systems, our work contributes to the advancement of the NSC framework and paves the way towards effective and robust learning-based controllers.
Related papers
- Correcting the Mythos of KL-Regularization: Direct Alignment without Overoptimization via Chi-Squared Preference Optimization [78.82586283794886]
We present a new offline alignment algorithm, $chi2$-Preference Optimization ($chi$PO)
$chi$PO implements the principle of pessimism in the face of uncertainty via regularization.
It is provably robust to overoptimization and achieves sample-complexity guarantees based on single-policy concentrability.
arXiv Detail & Related papers (2024-07-18T11:08:40Z) - Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer [52.09480867526656]
We identify the source of misalignment as a form of distributional shift and uncertainty in learning human preferences.
To mitigate overoptimization, we first propose a theoretical algorithm that chooses the best policy for an adversarially chosen reward model.
Using the equivalence between reward models and the corresponding optimal policy, the algorithm features a simple objective that combines a preference optimization loss and a supervised learning loss.
arXiv Detail & Related papers (2024-05-26T05:38:50Z) - Sub-linear Regret in Adaptive Model Predictive Control [56.705978425244496]
We present STT-MPC (Self-Tuning Tube-based Model Predictive Control), an online oracle that combines the certainty-equivalence principle and polytopic tubes.
We analyze the regret of the algorithm, when compared to an algorithm initially aware of the system dynamics.
arXiv Detail & Related papers (2023-10-07T15:07:10Z) - Online Learning and Optimization for Queues with Unknown Demand Curve
and Service Distribution [26.720986177499338]
We investigate an optimization problem in a queueing system where the service provider selects the optimal service fee p and service capacity mu.
We develop an online learning framework that automatically incorporates the parameter estimation errors in the solution prescription process.
arXiv Detail & Related papers (2023-03-06T08:47:40Z) - Follow the Clairvoyant: an Imitation Learning Approach to Optimal
Control [4.978565634673048]
We consider control of dynamical systems through the lens of competitive analysis.
Motivated by the observation that the optimal cost only provides coarse information about the ideal closed-loop behavior, we propose directly minimizing the tracking error.
arXiv Detail & Related papers (2022-11-14T14:15:12Z) - Rate-Optimal Online Convex Optimization in Adaptive Linear Control [0.0]
We consider the problem of controlling an unknown convex linear system under adversarially changing costs.
We present the first computationally-gret that attains an optimal linear hindsight function.
arXiv Detail & Related papers (2022-06-03T07:32:11Z) - Lazy Lagrangians with Predictions for Online Learning [24.18464455081512]
We consider the general problem of online convex optimization with time-varying additive constraints.
A novel primal-dual algorithm is designed by combining a Follow-The-Regularized-Leader iteration with prediction-adaptive dynamic steps.
Our work extends the FTRL framework for this constrained OCO setting and outperforms the respective state-of-the-art greedy-based solutions.
arXiv Detail & Related papers (2022-01-08T21:49:10Z) - Regret-optimal Estimation and Control [52.28457815067461]
We show that the regret-optimal estimator and regret-optimal controller can be derived in state-space form.
We propose regret-optimal analogs of Model-Predictive Control (MPC) and the Extended KalmanFilter (EKF) for systems with nonlinear dynamics.
arXiv Detail & Related papers (2021-06-22T23:14:21Z) - Optimal Robustness-Consistency Trade-offs for Learning-Augmented Online
Algorithms [85.97516436641533]
We study the problem of improving the performance of online algorithms by incorporating machine-learned predictions.
The goal is to design algorithms that are both consistent and robust.
We provide the first set of non-trivial lower bounds for competitive analysis using machine-learned predictions.
arXiv Detail & Related papers (2020-10-22T04:51:01Z) - Adaptive Control and Regret Minimization in Linear Quadratic Gaussian
(LQG) Setting [91.43582419264763]
We propose LqgOpt, a novel reinforcement learning algorithm based on the principle of optimism in the face of uncertainty.
LqgOpt efficiently explores the system dynamics, estimates the model parameters up to their confidence interval, and deploys the controller of the most optimistic model.
arXiv Detail & Related papers (2020-03-12T19:56:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.