InfoBot: Transfer and Exploration via the Information Bottleneck
- URL: http://arxiv.org/abs/1901.10902v5
- Date: Tue, 5 Dec 2023 19:00:24 GMT
- Title: InfoBot: Transfer and Exploration via the Information Bottleneck
- Authors: Anirudh Goyal, Riashat Islam, Daniel Strouse, Zafarali Ahmed, Matthew
Botvinick, Hugo Larochelle, Yoshua Bengio, Sergey Levine
- Abstract summary: A central challenge in reinforcement learning is discovering effective policies for tasks where rewards are sparsely distributed.
We propose to learn about decision states from prior experience.
We find that this simple mechanism effectively identifies decision states, even in partially observed settings.
- Score: 105.28380750802019
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A central challenge in reinforcement learning is discovering effective
policies for tasks where rewards are sparsely distributed. We postulate that in
the absence of useful reward signals, an effective exploration strategy should
seek out {\it decision states}. These states lie at critical junctions in the
state space from where the agent can transition to new, potentially unexplored
regions. We propose to learn about decision states from prior experience. By
training a goal-conditioned policy with an information bottleneck, we can
identify decision states by examining where the model actually leverages the
goal state. We find that this simple mechanism effectively identifies decision
states, even in partially observed settings. In effect, the model learns the
sensory cues that correlate with potential subgoals. In new environments, this
model can then identify novel subgoals for further exploration, guiding the
agent through a sequence of potential decision states and through new regions
of the state space.
Related papers
- Learning Where to Look: Self-supervised Viewpoint Selection for Active Localization using Geometrical Information [68.10033984296247]
This paper explores the domain of active localization, emphasizing the importance of viewpoint selection to enhance localization accuracy.
Our contributions involve using a data-driven approach with a simple architecture designed for real-time operation, a self-supervised data training method, and the capability to consistently integrate our map into a planning framework tailored for real-world robotics applications.
arXiv Detail & Related papers (2024-07-22T12:32:09Z) - ELDEN: Exploration via Local Dependencies [37.44189774149647]
We present ELDEN, Exploration via Local DepENdencies, a novel intrinsic reward that encourages the discovery of new interactions between entities.
We evaluate the performance of ELDEN on four different domains with complex dependencies, ranging from 2D grid worlds to 3D robotic tasks.
arXiv Detail & Related papers (2023-10-12T20:20:21Z) - Learning Continuous Control Policies for Information-Theoretic Active
Perception [24.297016904005257]
We tackle the problem of learning a control policy that maximizes the mutual information between the landmark states and the sensor observations.
We employ a Kalman filter to convert the partially observable problem in the landmark state to Markov decision process (MDP), a differentiable field of view to shape the reward, and an attention-based neural network to represent the control policy.
arXiv Detail & Related papers (2022-09-26T05:28:32Z) - Local Explanations for Reinforcement Learning [14.87922813917482]
We propose a novel perspective to understanding RL policies based on identifying important states from automatically learned meta-states.
We show that our algorithm to find meta-states converges and the objective that selects important states from each meta-state is submodular leading to efficient high quality greedy selection.
arXiv Detail & Related papers (2022-02-08T02:02:09Z) - MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven
Reinforcement Learning [65.52675802289775]
We show that an uncertainty aware classifier can solve challenging reinforcement learning problems.
We propose a novel method for computing the normalized maximum likelihood (NML) distribution.
We show that the resulting algorithm has a number of intriguing connections to both count-based exploration methods and prior algorithms for learning reward functions.
arXiv Detail & Related papers (2021-07-15T08:19:57Z) - Feature-Based Interpretable Reinforcement Learning based on
State-Transition Models [3.883460584034766]
Growing concerns regarding the operational usage of AI models in the real-world has caused a surge of interest in explaining AI models' decisions to humans.
We propose a method for offering local explanations on risk in reinforcement learning.
arXiv Detail & Related papers (2021-05-14T23:43:11Z) - A New Bandit Setting Balancing Information from State Evolution and
Corrupted Context [52.67844649650687]
We propose a new sequential decision-making setting combining key aspects of two established online learning problems with bandit feedback.
The optimal action to play at any given moment is contingent on an underlying changing state which is not directly observable by the agent.
We present an algorithm that uses a referee to dynamically combine the policies of a contextual bandit and a multi-armed bandit.
arXiv Detail & Related papers (2020-11-16T14:35:37Z) - Guided Uncertainty-Aware Policy Optimization: Combining Learning and
Model-Based Strategies for Sample-Efficient Policy Learning [75.56839075060819]
Traditional robotic approaches rely on an accurate model of the environment, a detailed description of how to perform the task, and a robust perception system to keep track of the current state.
reinforcement learning approaches can operate directly from raw sensory inputs with only a reward signal to describe the task, but are extremely sample-inefficient and brittle.
In this work, we combine the strengths of model-based methods with the flexibility of learning-based methods to obtain a general method that is able to overcome inaccuracies in the robotics perception/actuation pipeline.
arXiv Detail & Related papers (2020-05-21T19:47:05Z) - Learning Discrete State Abstractions With Deep Variational Inference [7.273663549650618]
We propose a method for learning approximate bisimulations, a type of state abstraction.
We use a deep neural encoder to map states onto continuous embeddings.
We map these embeddings onto a discrete representation using an action-conditioned hidden Markov model.
arXiv Detail & Related papers (2020-03-09T17:58:27Z) - Mutual Information-based State-Control for Intrinsically Motivated
Reinforcement Learning [102.05692309417047]
In reinforcement learning, an agent learns to reach a set of goals by means of an external reward signal.
In the natural world, intelligent organisms learn from internal drives, bypassing the need for external signals.
We propose to formulate an intrinsic objective as the mutual information between the goal states and the controllable states.
arXiv Detail & Related papers (2020-02-05T19:21:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.