Non-Stochastic Control with Bandit Feedback
- URL: http://arxiv.org/abs/2008.05523v1
- Date: Wed, 12 Aug 2020 18:40:00 GMT
- Title: Non-Stochastic Control with Bandit Feedback
- Authors: Paula Gradu and John Hallman and Elad Hazan
- Abstract summary: We give an efficient sublinear regret algorithm for a known or unknown system.
The main algorithmic difficulty is the dependence of the loss on past controls.
We propose an efficient algorithm for the general setting of bandit convex optimization for loss functions with memory.
- Score: 30.33117611898598
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the problem of controlling a linear dynamical system with
adversarial perturbations where the only feedback available to the controller
is the scalar loss, and the loss function itself is unknown. For this problem,
with either a known or unknown system, we give an efficient sublinear regret
algorithm. The main algorithmic difficulty is the dependence of the loss on
past controls. To overcome this issue, we propose an efficient algorithm for
the general setting of bandit convex optimization for loss functions with
memory, which may be of independent interest.
Related papers
- The Central Role of the Loss Function in Reinforcement Learning [46.72524235085568]
We demonstrate how different regression loss functions affect the sample efficiency and adaptivity of value-based decision making algorithms.
Across multiple settings, we prove that algorithms using the binary cross-entropy loss achieve first-order bounds scaling with the optimal policy's cost.
We hope that this paper serves as a guide analyzing decision making algorithms with varying loss functions, and can inspire the reader to seek out better loss functions to improve any decision making algorithm.
arXiv Detail & Related papers (2024-09-19T14:10:38Z) - LEARN: An Invex Loss for Outlier Oblivious Robust Online Optimization [56.67706781191521]
An adversary can introduce outliers by corrupting loss functions in an arbitrary number of k, unknown to the learner.
We present a robust online rounds optimization framework, where an adversary can introduce outliers by corrupting loss functions in an arbitrary number of k, unknown.
arXiv Detail & Related papers (2024-08-12T17:08:31Z) - Sub-linear Regret in Adaptive Model Predictive Control [56.705978425244496]
We present STT-MPC (Self-Tuning Tube-based Model Predictive Control), an online oracle that combines the certainty-equivalence principle and polytopic tubes.
We analyze the regret of the algorithm, when compared to an algorithm initially aware of the system dynamics.
arXiv Detail & Related papers (2023-10-07T15:07:10Z) - Data-Driven Adversarial Online Control for Unknown Linear Systems [17.595231077524467]
We present a novel data-driven online adaptive control algorithm to address this online control problem.
Our algorithm guarantees an $tmO(T2/3)$ regret gradient bound with high probability, which matches the best-known regret bound for this problem.
arXiv Detail & Related papers (2023-08-16T04:05:22Z) - On Convex Data-Driven Inverse Optimal Control for Nonlinear, Non-stationary and Stochastic Systems [0.7240153598817866]
This paper is concerned with a finite-horizon inverse control problem, which has the goal of reconstructing, from observations, the possibly non-nonstationary cost driving the actions of an agent.
We present a result enabling cost optimization by solving an problem that is convex non-stationary agent cost.
All experiments confirm the effectiveness of our approach.
arXiv Detail & Related papers (2023-06-24T10:25:53Z) - Non-stationary Online Learning with Memory and Non-stochastic Control [71.14503310914799]
We study the problem of Online Convex Optimization (OCO) with memory, which allows loss functions to depend on past decisions.
In this paper, we introduce dynamic policy regret as the performance measure to design algorithms robust to non-stationary environments.
We propose a novel algorithm for OCO with memory that provably enjoys an optimal dynamic policy regret in terms of time horizon, non-stationarity measure, and memory length.
arXiv Detail & Related papers (2021-02-07T09:45:15Z) - Persistent Reductions in Regularized Loss Minimization for Variable
Selection [3.3504365823045035]
We show that for a broad class of loss functions it is possible to efficiently identify a subset which are guaranteed to have zero coefficients.
We employ an existing ray algorithm for extreme ray identification and make our guarantee algorithm applicable in ultra-high dimensional problems.
arXiv Detail & Related papers (2020-11-30T04:59:44Z) - Bandit Linear Control [0.0]
We consider the problem of controlling a known linear dynamical system under noise, adversarially chosen costs, and bandit feedback.
We present a new and efficient algorithm that, for strongly convex and smooth costs, obtains regret that grows with the square root of the time horizon $T$.
arXiv Detail & Related papers (2020-07-01T21:12:19Z) - Logarithmic Regret for Adversarial Online Control [56.12283443161479]
We give the first algorithm with logarithmic regret for arbitrary adversarial disturbance sequences.
Our algorithm and analysis use a characterization for the offline control law to reduce the online control problem to (delayed) online learning.
arXiv Detail & Related papers (2020-02-29T06:29:19Z) - Improper Learning for Non-Stochastic Control [78.65807250350755]
We consider the problem of controlling a possibly unknown linear dynamical system with adversarial perturbations, adversarially chosen convex loss functions, and partially observed states.
Applying online descent to this parametrization yields a new controller which attains sublinear regret vs. a large class of closed-loop policies.
Our bounds are the first in the non-stochastic control setting that compete with emphall stabilizing linear dynamical controllers.
arXiv Detail & Related papers (2020-01-25T02:12:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.