Related papers: Optimistic Active Exploration of Dynamical Systems

Optimistic Active Exploration of Dynamical Systems

URL: http://arxiv.org/abs/2306.12371v2
Date: Mon, 30 Oct 2023 15:18:01 GMT
Title: Optimistic Active Exploration of Dynamical Systems
Authors: Bhavya Sukhija, Lenart Treven, Cansu Sancaktar, Sebastian Blaes, Stelian Coros, Andreas Krause
Abstract summary: We develop an algorithm for active exploration called OPAX. We show how OPAX can be reduced to an optimal control problem that can be solved at each episode. Our experiments show that OPAX is not only theoretically sound but also performs well for zero-shot planning on novel downstream tasks.
Score: 52.91573056896633
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Reinforcement learning algorithms commonly seek to optimize policies for solving one particular task. How should we explore an unknown dynamical system such that the estimated model globally approximates the dynamics and allows us to solve multiple downstream tasks in a zero-shot manner? In this paper, we address this challenge, by developing an algorithm -- OPAX -- for active exploration. OPAX uses well-calibrated probabilistic models to quantify the epistemic uncertainty about the unknown dynamics. It optimistically -- w.r.t. to plausible dynamics -- maximizes the information gain between the unknown dynamics and state observations. We show how the resulting optimization problem can be reduced to an optimal control problem that can be solved at each episode using standard approaches. We analyze our algorithm for general models, and, in the case of Gaussian process dynamics, we give a first-of-its-kind sample complexity bound and show that the epistemic uncertainty converges to zero. In our experiments, we compare OPAX with other heuristic active exploration approaches on several environments. Our experiments show that OPAX is not only theoretically sound but also performs well for zero-shot planning on novel downstream tasks.

Related papers

Accelerated zero-order SGD under high-order smoothness and overparameterized regime [79.85163929026146]
We present a novel gradient-free algorithm to solve convex optimization problems. Such problems are encountered in medicine, physics, and machine learning. We provide convergence guarantees for the proposed algorithm under both types of noise.
arXiv Detail & Related papers (2024-11-21T10:26:17Z)
BEACON: A Bayesian Optimization Strategy for Novelty Search in Expensive Black-Box Systems [1.204357447396532]
Novelty search (NS) refers to a class of exploration algorithms that automatically uncover diverse system behaviors through simulations or experiments. We propose a sample-efficient NS method inspired by Bayesian optimization principles. We show that BEACON comprehensively outperforms existing baselines by finding substantially larger sets of diverse behaviors under limited sampling budgets.
arXiv Detail & Related papers (2024-06-05T20:23:52Z)
Foundational Inference Models for Dynamical Systems [5.549794481031468]
We offer a fresh perspective on the classical problem of imputing missing time series data, whose underlying dynamics are assumed to be determined by ODEs. We propose a novel supervised learning framework for zero-shot time series imputation, through parametric functions satisfying some (hidden) ODEs. We empirically demonstrate that one and the same (pretrained) recognition model can perform zero-shot imputation across 63 distinct time series with missing values.
arXiv Detail & Related papers (2024-02-12T11:48:54Z)
FLEX: an Adaptive Exploration Algorithm for Nonlinear Systems [6.612035830987298]
We introduce FLEX, an exploration algorithm for nonlinear dynamics based on optimal experimental design. Our policy maximizes the information of the next step and results in an adaptive exploration algorithm. The performance achieved by FLEX is competitive and its computational cost is low.
arXiv Detail & Related papers (2023-04-26T10:20:55Z)
Representation Learning with Multi-Step Inverse Kinematics: An Efficient and Optimal Approach to Rich-Observation RL [106.82295532402335]
Existing reinforcement learning algorithms suffer from computational intractability, strong statistical assumptions, and suboptimal sample complexity. We provide the first computationally efficient algorithm that attains rate-optimal sample complexity with respect to the desired accuracy level. Our algorithm, MusIK, combines systematic exploration with representation learning based on multi-step inverse kinematics.
arXiv Detail & Related papers (2023-04-12T14:51:47Z)
Maximum entropy exploration in contextual bandits with neural networks and energy based models [63.872634680339644]
We present two classes of models, one with neural networks as reward estimators, and the other with energy based models. We show that both techniques outperform well-known standard algorithms, where energy based models have the best overall performance. This provides practitioners with new techniques that perform well in static and dynamic settings, and are particularly well suited to non-linear scenarios with continuous action spaces.
arXiv Detail & Related papers (2022-10-12T15:09:45Z)
Dream to Explore: Adaptive Simulations for Autonomous Systems [3.0664963196464448]
We tackle the problem of learning to control dynamical systems by applying Bayesian nonparametric methods. By employing Gaussian processes to discover latent world dynamics, we mitigate common data efficiency issues observed in reinforcement learning. Our algorithm jointly learns a world model and policy by optimizing a variational lower bound of a log-likelihood.
arXiv Detail & Related papers (2021-10-27T04:27:28Z)
Efficient Model-Based Reinforcement Learning through Optimistic Policy Search and Planning [93.1435980666675]
We show how optimistic exploration can be easily combined with state-of-the-art reinforcement learning algorithms. Our experiments demonstrate that optimistic exploration significantly speeds-up learning when there are penalties on actions.
arXiv Detail & Related papers (2020-06-15T18:37:38Z)
Active Model Estimation in Markov Decision Processes [108.46146218973189]
We study the problem of efficient exploration in order to learn an accurate model of an environment, modeled as a Markov decision process (MDP) We show that our Markov-based algorithm outperforms both our original algorithm and the maximum entropy algorithm in the small sample regime.
arXiv Detail & Related papers (2020-03-06T16:17:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.