Related papers: Optimal Exploration for Model-Based RL in Nonlinear Systems

Optimal Exploration for Model-Based RL in Nonlinear Systems

URL: http://arxiv.org/abs/2306.09210v1
Date: Thu, 15 Jun 2023 15:47:50 GMT
Title: Optimal Exploration for Model-Based RL in Nonlinear Systems
Authors: Andrew Wagenmaker, Guanya Shi, Kevin Jamieson
Abstract summary: Learning to control unknown nonlinear dynamical systems is a fundamental problem in reinforcement learning and control theory. We develop an algorithm able to efficiently explore the system to reduce uncertainty in a task-dependent metric. Our algorithm relies on a general reduction from policy optimization to optimal experiment design in arbitrary systems, and may be of independent interest.
Score: 14.540210895533937
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Learning to control unknown nonlinear dynamical systems is a fundamental problem in reinforcement learning and control theory. A commonly applied approach is to first explore the environment (exploration), learn an accurate model of it (system identification), and then compute an optimal controller with the minimum cost on this estimated system (policy optimization). While existing work has shown that it is possible to learn a uniformly good model of the system~\citep{mania2020active}, in practice, if we aim to learn a good controller with a low cost on the actual system, certain system parameters may be significantly more critical than others, and we therefore ought to focus our exploration on learning such parameters. In this work, we consider the setting of nonlinear dynamical systems and seek to formally quantify, in such settings, (a) which parameters are most relevant to learning a good controller, and (b) how we can best explore so as to minimize uncertainty in such parameters. Inspired by recent work in linear systems~\citep{wagenmaker2021task}, we show that minimizing the controller loss in nonlinear systems translates to estimating the system parameters in a particular, task-dependent metric. Motivated by this, we develop an algorithm able to efficiently explore the system to reduce uncertainty in this metric, and prove a lower bound showing that our approach learns a controller at a near-instance-optimal rate. Our algorithm relies on a general reduction from policy optimization to optimal experiment design in arbitrary systems, and may be of independent interest. We conclude with experiments demonstrating the effectiveness of our method in realistic nonlinear robotic systems.

Related papers

End-to-End Learning Framework for Solving Non-Markovian Optimal Control [9.156265463755807]
We propose an innovative system identification method control strategy for FOLTI systems. We also develop the first end-to-end data-driven learning framework, Fractional-Order Learning for Optimal Control (FOLOC)
arXiv Detail & Related papers (2025-02-07T04:18:56Z)
MPC of Uncertain Nonlinear Systems with Meta-Learning for Fast Adaptation of Neural Predictive Models [6.031205224945912]
A neural State-Space Model (NSSM) is used to approximate the nonlinear system, where a deep encoder network learns the nonlinearity from data. This transforms the nonlinear system into a linear system in a latent space, enabling the application of model predictive control (MPC) to determine effective control actions.
arXiv Detail & Related papers (2024-04-18T11:29:43Z)
Active Learning for Control-Oriented Identification of Nonlinear Systems [26.231260751633307]
We present the first finite sample analysis of an active learning algorithm suitable for a general class of nonlinear dynamics. In certain settings, the excess control cost of our algorithm achieves the optimal rate, up to logarithmic factors. We validate our approach in simulation, showcasing the advantage of active, control-oriented exploration for controlling nonlinear systems.
arXiv Detail & Related papers (2024-04-13T15:40:39Z)
Parameter-Adaptive Approximate MPC: Tuning Neural-Network Controllers without Retraining [50.00291020618743]
This work introduces a novel, parameter-adaptive AMPC architecture capable of online tuning without recomputing large datasets and retraining. We showcase the effectiveness of parameter-adaptive AMPC by controlling the swing-ups of two different real cartpole systems with a severely resource-constrained microcontroller (MCU) Taken together, these contributions represent a marked step toward the practical application of AMPC in real-world systems.
arXiv Detail & Related papers (2024-04-08T20:02:19Z)
Supervised DKRC with Images for Offline System Identification [77.34726150561087]
Modern dynamical systems are becoming increasingly non-linear and complex. There is a need for a framework to model these systems in a compact and comprehensive representation for prediction and control. Our approach learns these basis functions using a supervised learning approach.
arXiv Detail & Related papers (2021-09-06T04:39:06Z)
Neural-iLQR: A Learning-Aided Shooting Method for Trajectory Optimization [17.25824905485415]
We present Neural-iLQR, a learning-aided shooting method over the unconstrained control space. It is shown to outperform the conventional iLQR significantly in the presence of inaccuracies in system models.
arXiv Detail & Related papers (2020-11-21T07:17:28Z)
Model-Free Control of Dynamical Systems with Deep Reservoir Computing [0.0]
We propose and demonstrate a nonlinear control method that can be applied to unknown, complex systems. Our technique requires no prior knowledge of the system and is thus model-free. Reservoir computers are well-suited to the control problem because they require small training data sets and remarkably low training times.
arXiv Detail & Related papers (2020-10-05T18:59:51Z)
Anticipating the Long-Term Effect of Online Learning in Control [75.6527644813815]
AntLer is a design algorithm for learning-based control laws that anticipates learning. We show that AntLer approximates an optimal solution arbitrarily accurately with probability one.
arXiv Detail & Related papers (2020-07-24T07:00:14Z)
Reinforcement Learning with Fast Stabilization in Linear Dynamical Systems [91.43582419264763]
We study model-based reinforcement learning (RL) in unknown stabilizable linear dynamical systems. We propose an algorithm that certifies fast stabilization of the underlying system by effectively exploring the environment. We show that the proposed algorithm attains $tildemathcalO(sqrtT)$ regret after $T$ time steps of agent-environment interaction.
arXiv Detail & Related papers (2020-07-23T23:06:40Z)
Reinforcement Learning Control of Robotic Knee with Human in the Loop by Flexible Policy Iteration [17.365135977882215]
This study fills important voids by introducing innovative features to the policy algorithm. We show system level performances including convergence of the approximate value function, (sub)optimality of the solution, and stability of the system.
arXiv Detail & Related papers (2020-06-16T09:09:48Z)
Adaptive Control and Regret Minimization in Linear Quadratic Gaussian (LQG) Setting [91.43582419264763]
We propose LqgOpt, a novel reinforcement learning algorithm based on the principle of optimism in the face of uncertainty. LqgOpt efficiently explores the system dynamics, estimates the model parameters up to their confidence interval, and deploys the controller of the most optimistic model.
arXiv Detail & Related papers (2020-03-12T19:56:38Z)
Information Theoretic Model Predictive Q-Learning [64.74041985237105]
We present a novel theoretical connection between information theoretic MPC and entropy regularized RL. We develop a Q-learning algorithm that can leverage biased models.
arXiv Detail & Related papers (2019-12-31T00:29:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.