Dynamic Bottleneck for Robust Self-Supervised Exploration
- URL: http://arxiv.org/abs/2110.10735v1
- Date: Wed, 20 Oct 2021 19:17:05 GMT
- Title: Dynamic Bottleneck for Robust Self-Supervised Exploration
- Authors: Chenjia Bai, Lingxiao Wang, Lei Han, Animesh Garg, Jianye Hao, Peng
Liu, Zhaoran Wang
- Abstract summary: We propose a Dynamic Bottleneck (DB) model, which attains a dynamics-relevant representation based on the information-bottleneck principle.
Based on the DB model, we further propose DB-bonus, which encourages the agent to explore state-action pairs with high information gain.
Our experiments show that exploration with DB bonus outperforms several state-of-the-art exploration methods in noisy environments.
- Score: 84.78836146128236
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Exploration methods based on pseudo-count of transitions or curiosity of
dynamics have achieved promising results in solving reinforcement learning with
sparse rewards. However, such methods are usually sensitive to environmental
dynamics-irrelevant information, e.g., white-noise. To handle such
dynamics-irrelevant information, we propose a Dynamic Bottleneck (DB) model,
which attains a dynamics-relevant representation based on the
information-bottleneck principle. Based on the DB model, we further propose
DB-bonus, which encourages the agent to explore state-action pairs with high
information gain. We establish theoretical connections between the proposed
DB-bonus, the upper confidence bound (UCB) for linear case, and the visiting
count for tabular case. We evaluate the proposed method on Atari suits with
dynamics-irrelevant noises. Our experiments show that exploration with DB bonus
outperforms several state-of-the-art exploration methods in noisy environments.
Related papers
- Active Learning of Dynamics Using Prior Domain Knowledge in the Sampling Process [18.406992961818368]
We present an active learning algorithm for learning dynamics that leverages side information by explicitly incorporating prior domain knowledge into the sampling process.
Our proposed algorithm guides the exploration toward regions that demonstrate high empirical discrepancy between the observed data and an imperfect prior model of the dynamics derived from side information.
We rigorously prove that our active learning algorithm yields a consistent estimate of the underlying dynamics by providing an explicit rate of convergence for the maximum predictive variance.
arXiv Detail & Related papers (2024-03-25T22:20:45Z) - A Bayesian Approach to Robust Inverse Reinforcement Learning [54.24816623644148]
We consider a Bayesian approach to offline model-based inverse reinforcement learning (IRL)
The proposed framework differs from existing offline model-based IRL approaches by performing simultaneous estimation of the expert's reward function and subjective model of environment dynamics.
Our analysis reveals a novel insight that the estimated policy exhibits robust performance when the expert is believed to have a highly accurate model of the environment.
arXiv Detail & Related papers (2023-09-15T17:37:09Z) - Dynamic Exploration-Exploitation Trade-Off in Active Learning Regression
with Bayesian Hierarchical Modeling [4.132882666134921]
Methods that consider exploration-exploitation simultaneously employ fixed or ad-hoc measures to control the trade-off that may not be optimal.
We develop a Bayesian hierarchical approach, referred as BHEEM, to dynamically balance the exploration-exploitation trade-off.
arXiv Detail & Related papers (2023-04-16T01:40:48Z) - STEERING: Stein Information Directed Exploration for Model-Based
Reinforcement Learning [111.75423966239092]
We propose an exploration incentive in terms of the integral probability metric (IPM) between a current estimate of the transition model and the unknown optimal.
Based on KSD, we develop a novel algorithm algo: textbfSTEin information dirtextbfEcted exploration for model-based textbfReinforcement LearntextbfING.
arXiv Detail & Related papers (2023-01-28T00:49:28Z) - Self-supervised Sequential Information Bottleneck for Robust Exploration
in Deep Reinforcement Learning [28.75574762244266]
In this work, we introduce the sequential information bottleneck objective for learning compressed and temporally coherent representations.
For efficient exploration in noisy environments, we further construct intrinsic rewards that capture task-relevant state novelty.
arXiv Detail & Related papers (2022-09-12T15:41:10Z) - Deep Impulse Responses: Estimating and Parameterizing Filters with Deep
Networks [76.830358429947]
Impulse response estimation in high noise and in-the-wild settings is a challenging problem.
We propose a novel framework for parameterizing and estimating impulse responses based on recent advances in neural representation learning.
arXiv Detail & Related papers (2022-02-07T18:57:23Z) - Reinforcement Learning based Path Exploration for Sequential Explainable
Recommendation [57.67616822888859]
We propose a novel Temporal Meta-path Guided Explainable Recommendation leveraging Reinforcement Learning (TMER-RL)
TMER-RL utilizes reinforcement item-item path modelling between consecutive items with attention mechanisms to sequentially model dynamic user-item evolutions on dynamic knowledge graph for explainable recommendation.
Extensive evaluations of TMER on two real-world datasets show state-of-the-art performance compared against recent strong baselines.
arXiv Detail & Related papers (2021-11-24T04:34:26Z) - Leveraging Global Parameters for Flow-based Neural Posterior Estimation [90.21090932619695]
Inferring the parameters of a model based on experimental observations is central to the scientific method.
A particularly challenging setting is when the model is strongly indeterminate, i.e., when distinct sets of parameters yield identical observations.
We present a method for cracking such indeterminacy by exploiting additional information conveyed by an auxiliary set of observations sharing global parameters.
arXiv Detail & Related papers (2021-02-12T12:23:13Z) - Planning with Exploration: Addressing Dynamics Bottleneck in Model-based
Reinforcement Learning [25.077671501605746]
We find that the trajectory reward estimation error is the main reason that causes dynamics bottleneck dilemma through theoretical analysis.
Motivated by this, a model-based control method combined with exploration named MOdel-based Progressive Entropy-based Exploration (MOPE2) is proposed.
arXiv Detail & Related papers (2020-10-24T15:29:02Z) - Variational Dynamic for Self-Supervised Exploration in Deep Reinforcement Learning [12.76337275628074]
In this work, we propose a variational dynamic model based on the conditional variational inference to model the multimodality andgenerativeity.
We derive an upper bound of the negative log-likelihood of the environmental transition and use such an upper bound as the intrinsic reward for exploration.
Our method outperforms several state-of-the-art environment model-based exploration approaches.
arXiv Detail & Related papers (2020-10-17T09:54:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.