Information Theoretic Model Predictive Q-Learning
- URL: http://arxiv.org/abs/2001.02153v2
- Date: Tue, 5 May 2020 21:49:55 GMT
- Title: Information Theoretic Model Predictive Q-Learning
- Authors: Mohak Bhardwaj, Ankur Handa, Dieter Fox, Byron Boots
- Abstract summary: We present a novel theoretical connection between information theoretic MPC and entropy regularized RL.
We develop a Q-learning algorithm that can leverage biased models.
- Score: 64.74041985237105
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Model-free Reinforcement Learning (RL) works well when experience can be
collected cheaply and model-based RL is effective when system dynamics can be
modeled accurately. However, both assumptions can be violated in real world
problems such as robotics, where querying the system can be expensive and
real-world dynamics can be difficult to model. In contrast to RL, Model
Predictive Control (MPC) algorithms use a simulator to optimize a simple policy
class online, constructing a closed-loop controller that can effectively
contend with real-world dynamics. MPC performance is usually limited by factors
such as model bias and the limited horizon of optimization. In this work, we
present a novel theoretical connection between information theoretic MPC and
entropy regularized RL and develop a Q-learning algorithm that can leverage
biased models. We validate the proposed algorithm on sim-to-sim control tasks
to demonstrate the improvements over optimal control and reinforcement learning
from scratch. Our approach paves the way for deploying reinforcement learning
algorithms on real systems in a systematic manner.
Related papers
- Traffic expertise meets residual RL: Knowledge-informed model-based residual reinforcement learning for CAV trajectory control [1.5361702135159845]
This paper introduces a knowledge-informed model-based residual reinforcement learning framework.
It integrates traffic expert knowledge into a virtual environment model, employing the Intelligent Driver Model (IDM) for basic dynamics and neural networks for residual dynamics.
We propose a novel strategy that combines traditional control methods with residual RL, facilitating efficient learning and policy optimization without the need to learn from scratch.
arXiv Detail & Related papers (2024-08-30T16:16:57Z) - Data-Efficient Task Generalization via Probabilistic Model-based Meta
Reinforcement Learning [58.575939354953526]
PACOH-RL is a novel model-based Meta-Reinforcement Learning (Meta-RL) algorithm designed to efficiently adapt control policies to changing dynamics.
Existing Meta-RL methods require abundant meta-learning data, limiting their applicability in settings such as robotics.
Our experiment results demonstrate that PACOH-RL outperforms model-based RL and model-based Meta-RL baselines in adapting to new dynamic conditions.
arXiv Detail & Related papers (2023-11-13T18:51:57Z) - Efficient Learning of Voltage Control Strategies via Model-based Deep
Reinforcement Learning [9.936452412191326]
This article proposes a model-based deep reinforcement learning (DRL) method to design emergency control strategies for short-term voltage stability problems in power systems.
Recent advances show promising results in model-free DRL-based methods for power systems, but model-free methods suffer from poor sample efficiency and training time.
We propose a novel model-based-DRL framework where a deep neural network (DNN)-based dynamic surrogate model is utilized with the policy learning framework.
arXiv Detail & Related papers (2022-12-06T02:50:53Z) - When to Update Your Model: Constrained Model-based Reinforcement
Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL)
Our follow-up derived bounds reveal the relationship between model shifts and performance improvement.
A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z) - A Unified Framework for Alternating Offline Model Training and Policy
Learning [62.19209005400561]
In offline model-based reinforcement learning, we learn a dynamic model from historically collected data, and utilize the learned model and fixed datasets for policy learning.
We develop an iterative offline MBRL framework, where we maximize a lower bound of the true expected return.
With the proposed unified model-policy learning framework, we achieve competitive performance on a wide range of continuous-control offline reinforcement learning datasets.
arXiv Detail & Related papers (2022-10-12T04:58:51Z) - Model-Based Reinforcement Learning with SINDy [0.0]
We propose a novel method for discovering the governing non-linear dynamics of physical systems in reinforcement learning (RL)
We establish that this method is capable of discovering the underlying dynamics using significantly fewer trajectories than state of the art model learning algorithms.
arXiv Detail & Related papers (2022-08-30T19:03:48Z) - Model Generation with Provable Coverability for Offline Reinforcement
Learning [14.333861814143718]
offline optimization with dynamics-aware policy provides a new perspective for policy learning and out-of-distribution generalization.
But due to the limitation under the offline setting, the learned model could not mimic real dynamics well enough to support reliable out-of-distribution exploration.
We propose an algorithm to generate models optimizing their coverage for the real dynamics.
arXiv Detail & Related papers (2022-06-01T08:34:09Z) - Real-time Neural-MPC: Deep Learning Model Predictive Control for
Quadrotors and Agile Robotic Platforms [59.03426963238452]
We present Real-time Neural MPC, a framework to efficiently integrate large, complex neural network architectures as dynamics models within a model-predictive control pipeline.
We show the feasibility of our framework on real-world problems by reducing the positional tracking error by up to 82% when compared to state-of-the-art MPC approaches without neural network dynamics.
arXiv Detail & Related papers (2022-03-15T09:38:15Z) - MRAC-RL: A Framework for On-Line Policy Adaptation Under Parametric
Model Uncertainty [0.34265828682659694]
Reinforcement learning algorithms have been successfully used to develop control policies for dynamical systems.
We propose a set of novel MRAC algorithms applicable to a broad range of linear and nonlinear systems.
We demonstrate that the MRAC-RL approach improves upon state-of-the-art RL algorithms in developing control policies.
arXiv Detail & Related papers (2020-11-20T18:55:53Z) - Efficient Model-Based Reinforcement Learning through Optimistic Policy
Search and Planning [93.1435980666675]
We show how optimistic exploration can be easily combined with state-of-the-art reinforcement learning algorithms.
Our experiments demonstrate that optimistic exploration significantly speeds-up learning when there are penalties on actions.
arXiv Detail & Related papers (2020-06-15T18:37:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.