Optimistic Active Exploration of Dynamical Systems
- URL: http://arxiv.org/abs/2306.12371v2
- Date: Mon, 30 Oct 2023 15:18:01 GMT
- Title: Optimistic Active Exploration of Dynamical Systems
- Authors: Bhavya Sukhija, Lenart Treven, Cansu Sancaktar, Sebastian Blaes,
Stelian Coros, Andreas Krause
- Abstract summary: We develop an algorithm for active exploration called OPAX.
We show how OPAX can be reduced to an optimal control problem that can be solved at each episode.
Our experiments show that OPAX is not only theoretically sound but also performs well for zero-shot planning on novel downstream tasks.
- Score: 52.91573056896633
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement learning algorithms commonly seek to optimize policies for
solving one particular task. How should we explore an unknown dynamical system
such that the estimated model globally approximates the dynamics and allows us
to solve multiple downstream tasks in a zero-shot manner? In this paper, we
address this challenge, by developing an algorithm -- OPAX -- for active
exploration. OPAX uses well-calibrated probabilistic models to quantify the
epistemic uncertainty about the unknown dynamics. It optimistically -- w.r.t.
to plausible dynamics -- maximizes the information gain between the unknown
dynamics and state observations. We show how the resulting optimization problem
can be reduced to an optimal control problem that can be solved at each episode
using standard approaches. We analyze our algorithm for general models, and, in
the case of Gaussian process dynamics, we give a first-of-its-kind sample
complexity bound and show that the epistemic uncertainty converges to zero. In
our experiments, we compare OPAX with other heuristic active exploration
approaches on several environments. Our experiments show that OPAX is not only
theoretically sound but also performs well for zero-shot planning on novel
downstream tasks.
Related papers
- Foundational Inference Models for Dynamical Systems [5.549794481031468]
We offer a fresh perspective on the classical problem of imputing missing time series data, whose underlying dynamics are assumed to be determined by ODEs.
We propose a novel supervised learning framework for zero-shot time series imputation, through parametric functions satisfying some (hidden) ODEs.
We empirically demonstrate that one and the same (pretrained) recognition model can perform zero-shot imputation across 63 distinct time series with missing values.
arXiv Detail & Related papers (2024-02-12T11:48:54Z) - FLEX: an Adaptive Exploration Algorithm for Nonlinear Systems [6.612035830987298]
We introduce FLEX, an exploration algorithm for nonlinear dynamics based on optimal experimental design.
Our policy maximizes the information of the next step and results in an adaptive exploration algorithm.
The performance achieved by FLEX is competitive and its computational cost is low.
arXiv Detail & Related papers (2023-04-26T10:20:55Z) - Representation Learning with Multi-Step Inverse Kinematics: An Efficient
and Optimal Approach to Rich-Observation RL [106.82295532402335]
Existing reinforcement learning algorithms suffer from computational intractability, strong statistical assumptions, and suboptimal sample complexity.
We provide the first computationally efficient algorithm that attains rate-optimal sample complexity with respect to the desired accuracy level.
Our algorithm, MusIK, combines systematic exploration with representation learning based on multi-step inverse kinematics.
arXiv Detail & Related papers (2023-04-12T14:51:47Z) - Maximum entropy exploration in contextual bandits with neural networks
and energy based models [63.872634680339644]
We present two classes of models, one with neural networks as reward estimators, and the other with energy based models.
We show that both techniques outperform well-known standard algorithms, where energy based models have the best overall performance.
This provides practitioners with new techniques that perform well in static and dynamic settings, and are particularly well suited to non-linear scenarios with continuous action spaces.
arXiv Detail & Related papers (2022-10-12T15:09:45Z) - An end-to-end deep learning approach for extracting stochastic dynamical
systems with $\alpha$-stable L\'evy noise [5.815325960286111]
In this work, we identify dynamical systems driven by $alpha$-stable Levy noise from only random pairwise data.
Our innovations include: (1) designing a deep learning approach to learn both drift and diffusion terms for Levy induced noise with $alpha$ across all values, (2) learning complex multiplicative noise without restrictions on small noise intensity, and (3) proposing an end-to-end complete framework for systems identification.
arXiv Detail & Related papers (2022-01-31T10:51:25Z) - Dream to Explore: Adaptive Simulations for Autonomous Systems [3.0664963196464448]
We tackle the problem of learning to control dynamical systems by applying Bayesian nonparametric methods.
By employing Gaussian processes to discover latent world dynamics, we mitigate common data efficiency issues observed in reinforcement learning.
Our algorithm jointly learns a world model and policy by optimizing a variational lower bound of a log-likelihood.
arXiv Detail & Related papers (2021-10-27T04:27:28Z) - Robust Value Iteration for Continuous Control Tasks [99.00362538261972]
When transferring a control policy from simulation to a physical system, the policy needs to be robust to variations in the dynamics to perform well.
We present Robust Fitted Value Iteration, which uses dynamic programming to compute the optimal value function on the compact state domain.
We show that robust value is more robust compared to deep reinforcement learning algorithm and the non-robust version of the algorithm.
arXiv Detail & Related papers (2021-05-25T19:48:35Z) - Efficient Model-Based Reinforcement Learning through Optimistic Policy
Search and Planning [93.1435980666675]
We show how optimistic exploration can be easily combined with state-of-the-art reinforcement learning algorithms.
Our experiments demonstrate that optimistic exploration significantly speeds-up learning when there are penalties on actions.
arXiv Detail & Related papers (2020-06-15T18:37:38Z) - Active Model Estimation in Markov Decision Processes [108.46146218973189]
We study the problem of efficient exploration in order to learn an accurate model of an environment, modeled as a Markov decision process (MDP)
We show that our Markov-based algorithm outperforms both our original algorithm and the maximum entropy algorithm in the small sample regime.
arXiv Detail & Related papers (2020-03-06T16:17:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.