Learning to Control under Time-Varying Environment
- URL: http://arxiv.org/abs/2206.02507v1
- Date: Mon, 6 Jun 2022 11:40:46 GMT
- Title: Learning to Control under Time-Varying Environment
- Authors: Yuzhen Han, Ruben Solozabal, Jing Dong, Xingyu Zhou, Martin Takac, Bin
Gu
- Abstract summary: This paper investigates the problem of regret in linear time-varying (LTV) dynamical systems.
We propose the first computationally tractable online algorithm with regret guarantees.
- Score: 18.48729114775298
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper investigates the problem of regret minimization in linear
time-varying (LTV) dynamical systems. Due to the simultaneous presence of
uncertainty and non-stationarity, designing online control algorithms for
unknown LTV systems remains a challenging task. At a cost of NP-hard offline
planning, prior works have introduced online convex optimization algorithms,
although they suffer from nonparametric rate of regret.
In this paper, we propose the first computationally tractable online
algorithm with regret guarantees that avoids offline planning over the state
linear feedback policies. Our algorithm is based on the optimism in the face of
uncertainty (OFU) principle in which we optimistically select the best model in
a high confidence region. Our algorithm is then more explorative when compared
to previous approaches. To overcome non-stationarity, we propose either a
restarting strategy (R-OFU) or a sliding window (SW-OFU) strategy. With proper
configuration, our algorithm is attains sublinear regret $O(T^{2/3})$. These
algorithms utilize data from the current phase for tracking variations on the
system dynamics. We corroborate our theoretical findings with numerical
experiments, which highlight the effectiveness of our methods. To the best of
our knowledge, our study establishes the first model-based online algorithm
with regret guarantees under LTV dynamical systems.
Related papers
- Sub-linear Regret in Adaptive Model Predictive Control [56.705978425244496]
We present STT-MPC (Self-Tuning Tube-based Model Predictive Control), an online oracle that combines the certainty-equivalence principle and polytopic tubes.
We analyze the regret of the algorithm, when compared to an algorithm initially aware of the system dynamics.
arXiv Detail & Related papers (2023-10-07T15:07:10Z) - Smoothed Online Learning for Prediction in Piecewise Affine Systems [43.64498536409903]
This paper builds on the recently developed smoothed online learning framework.
It provides the first algorithms for prediction and simulation in piecewise affine systems.
arXiv Detail & Related papers (2023-01-26T15:54:14Z) - Efficient Online Learning with Memory via Frank-Wolfe Optimization:
Algorithms with Bounded Dynamic Regret and Applications to Control [15.588080817106563]
We introduce the first projection-free meta-base learning algorithm with memory that minimizes dynamic regret.
We are motivated by artificial intelligence applications where autonomous agents need to adapt to time-varying environments.
arXiv Detail & Related papers (2023-01-02T01:12:29Z) - Large-Scale Sequential Learning for Recommender and Engineering Systems [91.3755431537592]
In this thesis, we focus on the design of an automatic algorithms that provide personalized ranking by adapting to the current conditions.
For the former, we propose novel algorithm called SAROS that take into account both kinds of feedback for learning over the sequence of interactions.
The proposed idea of taking into account the neighbour lines shows statistically significant results in comparison with the initial approach for faults detection in power grid.
arXiv Detail & Related papers (2022-05-13T21:09:41Z) - Online estimation and control with optimal pathlength regret [52.28457815067461]
A natural goal when designing online learning algorithms is to bound the regret of the algorithm in terms of the temporal variation of the input sequence.
Data-dependent "pathlength" regret bounds have recently been obtained for a wide variety of online learning problems, including OCO and bandits.
arXiv Detail & Related papers (2021-10-24T22:43:15Z) - An Online Learning Approach to Optimizing Time-Varying Costs of AoI [26.661352924641285]
We consider systems that require timely monitoring of sources over a communication network.
For the single source monitoring problem, we design algorithms that achieve sublinear regret compared to the best fixed policy in hindsight.
For the multiple source scheduling problem, we design a new online learning algorithm called Follow-the-Perturbed-Whittle-Leader.
arXiv Detail & Related papers (2021-05-27T18:10:56Z) - Learning Sampling Policy for Faster Derivative Free Optimization [100.27518340593284]
We propose a new reinforcement learning based ZO algorithm (ZO-RL) with learning the sampling policy for generating the perturbations in ZO optimization instead of using random sampling.
Our results show that our ZO-RL algorithm can effectively reduce the variances of ZO gradient by learning a sampling policy, and converge faster than existing ZO algorithms in different scenarios.
arXiv Detail & Related papers (2021-04-09T14:50:59Z) - An Asymptotically Optimal Primal-Dual Incremental Algorithm for
Contextual Linear Bandits [129.1029690825929]
We introduce a novel algorithm improving over the state-of-the-art along multiple dimensions.
We establish minimax optimality for any learning horizon in the special case of non-contextual linear bandits.
arXiv Detail & Related papers (2020-10-23T09:12:47Z) - Reinforcement Learning with Fast Stabilization in Linear Dynamical
Systems [91.43582419264763]
We study model-based reinforcement learning (RL) in unknown stabilizable linear dynamical systems.
We propose an algorithm that certifies fast stabilization of the underlying system by effectively exploring the environment.
We show that the proposed algorithm attains $tildemathcalO(sqrtT)$ regret after $T$ time steps of agent-environment interaction.
arXiv Detail & Related papers (2020-07-23T23:06:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.