Koopman Q-learning: Offline Reinforcement Learning via Symmetries of
Dynamics
- URL: http://arxiv.org/abs/2111.01365v1
- Date: Tue, 2 Nov 2021 04:32:18 GMT
- Title: Koopman Q-learning: Offline Reinforcement Learning via Symmetries of
Dynamics
- Authors: Matthias Weissenbacher, Samarth Sinha, Animesh Garg, Yoshinobu
Kawahara
- Abstract summary: offline reinforcement learning leverages large datasets to train policies without interactions with the environment.
Current algorithms over-fit to the training dataset and perform poorly when deployed to out-of-distribution generalizations of the environment.
We learn a Koopman latent representation which allows us to infer symmetries of the system's underlying dynamic.
We empirically evaluate our method on several benchmark offline reinforcement learning tasks and datasets including D4RL, Metaworld and Robosuite.
- Score: 29.219095364935885
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Offline reinforcement learning leverages large datasets to train policies
without interactions with the environment. The learned policies may then be
deployed in real-world settings where interactions are costly or dangerous.
Current algorithms over-fit to the training dataset and as a consequence
perform poorly when deployed to out-of-distribution generalizations of the
environment. We aim to address these limitations by learning a Koopman latent
representation which allows us to infer symmetries of the system's underlying
dynamic. The latter is then utilized to extend the otherwise static offline
dataset during training; this constitutes a novel data augmentation framework
which reflects the system's dynamic and is thus to be interpreted as an
exploration of the environments phase space. To obtain the symmetries we employ
Koopman theory in which nonlinear dynamics are represented in terms of a linear
operator acting on the space of measurement functions of the system and thus
symmetries of the dynamics may be inferred directly. We provide novel
theoretical results on the existence and nature of symmetries relevant for
control systems such as reinforcement learning settings. Moreover, we
empirically evaluate our method on several benchmark offline reinforcement
learning tasks and datasets including D4RL, Metaworld and Robosuite and find
that by using our framework we consistently improve the state-of-the-art for
Q-learning methods.
Related papers
- Scalable Offline Reinforcement Learning for Mean Field Games [6.8267158622784745]
Off-MMD is a novel mean-field RL algorithm that approximates equilibrium policies in mean-field games using purely offline data.
Our algorithm scales to complex environments and demonstrates strong performance on benchmark tasks like crowd exploration or navigation.
arXiv Detail & Related papers (2024-10-23T14:16:34Z) - Model-Based Reinforcement Learning with Multi-Task Offline Pretraining [59.82457030180094]
We present a model-based RL method that learns to transfer potentially useful dynamics and action demonstrations from offline data to a novel task.
The main idea is to use the world models not only as simulators for behavior learning but also as tools to measure the task relevance.
We demonstrate the advantages of our approach compared with the state-of-the-art methods in Meta-World and DeepMind Control Suite.
arXiv Detail & Related papers (2023-06-06T02:24:41Z) - Leveraging Neural Koopman Operators to Learn Continuous Representations
of Dynamical Systems from Scarce Data [0.0]
We propose a new deep Koopman framework that represents dynamics in an intrinsically continuous way.
This framework leads to better performance on limited training data.
arXiv Detail & Related papers (2023-03-13T10:16:19Z) - Dream to Explore: Adaptive Simulations for Autonomous Systems [3.0664963196464448]
We tackle the problem of learning to control dynamical systems by applying Bayesian nonparametric methods.
By employing Gaussian processes to discover latent world dynamics, we mitigate common data efficiency issues observed in reinforcement learning.
Our algorithm jointly learns a world model and policy by optimizing a variational lower bound of a log-likelihood.
arXiv Detail & Related papers (2021-10-27T04:27:28Z) - Supervised DKRC with Images for Offline System Identification [77.34726150561087]
Modern dynamical systems are becoming increasingly non-linear and complex.
There is a need for a framework to model these systems in a compact and comprehensive representation for prediction and control.
Our approach learns these basis functions using a supervised learning approach.
arXiv Detail & Related papers (2021-09-06T04:39:06Z) - Learning to Continuously Optimize Wireless Resource in a Dynamic
Environment: A Bilevel Optimization Perspective [52.497514255040514]
This work develops a new approach that enables data-driven methods to continuously learn and optimize resource allocation strategies in a dynamic environment.
We propose to build the notion of continual learning into wireless system design, so that the learning model can incrementally adapt to the new episodes.
Our design is based on a novel bilevel optimization formulation which ensures certain fairness" across different data samples.
arXiv Detail & Related papers (2021-05-03T07:23:39Z) - GELATO: Geometrically Enriched Latent Model for Offline Reinforcement
Learning [54.291331971813364]
offline reinforcement learning approaches can be divided into proximal and uncertainty-aware methods.
In this work, we demonstrate the benefit of combining the two in a latent variational model.
Our proposed metrics measure both the quality of out of distribution samples as well as the discrepancy of examples in the data.
arXiv Detail & Related papers (2021-02-22T19:42:40Z) - Using Data Assimilation to Train a Hybrid Forecast System that Combines
Machine-Learning and Knowledge-Based Components [52.77024349608834]
We consider the problem of data-assisted forecasting of chaotic dynamical systems when the available data is noisy partial measurements.
We show that by using partial measurements of the state of the dynamical system, we can train a machine learning model to improve predictions made by an imperfect knowledge-based model.
arXiv Detail & Related papers (2021-02-15T19:56:48Z) - Learning to Continuously Optimize Wireless Resource In Episodically
Dynamic Environment [55.91291559442884]
This work develops a methodology that enables data-driven methods to continuously learn and optimize in a dynamic environment.
We propose to build the notion of continual learning into the modeling process of learning wireless systems.
Our design is based on a novel min-max formulation which ensures certain fairness" across different data samples.
arXiv Detail & Related papers (2020-11-16T08:24:34Z) - PLAS: Latent Action Space for Offline Reinforcement Learning [18.63424441772675]
The goal of offline reinforcement learning is to learn a policy from a fixed dataset, without further interactions with the environment.
Existing off-policy algorithms have limited performance on static datasets due to extrapolation errors from out-of-distribution actions.
We demonstrate that our method provides competitive performance consistently across various continuous control tasks and different types of datasets.
arXiv Detail & Related papers (2020-11-14T03:38:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.