Related papers: InfoBot: Transfer and Exploration via the Information Bottleneck

InfoBot: Transfer and Exploration via the Information Bottleneck

URL: http://arxiv.org/abs/1901.10902v5
Date: Tue, 5 Dec 2023 19:00:24 GMT
Title: InfoBot: Transfer and Exploration via the Information Bottleneck
Authors: Anirudh Goyal, Riashat Islam, Daniel Strouse, Zafarali Ahmed, Matthew Botvinick, Hugo Larochelle, Yoshua Bengio, Sergey Levine
Abstract summary: A central challenge in reinforcement learning is discovering effective policies for tasks where rewards are sparsely distributed. We propose to learn about decision states from prior experience. We find that this simple mechanism effectively identifies decision states, even in partially observed settings.
Score: 105.28380750802019
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A central challenge in reinforcement learning is discovering effective policies for tasks where rewards are sparsely distributed. We postulate that in the absence of useful reward signals, an effective exploration strategy should seek out {\it decision states}. These states lie at critical junctions in the state space from where the agent can transition to new, potentially unexplored regions. We propose to learn about decision states from prior experience. By training a goal-conditioned policy with an information bottleneck, we can identify decision states by examining where the model actually leverages the goal state. We find that this simple mechanism effectively identifies decision states, even in partially observed settings. In effect, the model learns the sensory cues that correlate with potential subgoals. In new environments, this model can then identify novel subgoals for further exploration, guiding the agent through a sequence of potential decision states and through new regions of the state space.

Related papers

Adaptive Target Localization under Uncertainty using Multi-Agent Deep Reinforcement Learning with Knowledge Transfer [15.605693371392212]
This work proposes a novel MADRL-based method for target localization in uncertain environments. The observations of the agents are designed in an optimized manner to capture essential information in the environment. A Deep Learning model builds on the knowledge from the MADRL model to accurately estimating the target location if it is unreachable.
arXiv Detail & Related papers (2025-01-19T02:58:22Z)
Learning Where to Look: Self-supervised Viewpoint Selection for Active Localization using Geometrical Information [68.10033984296247]
This paper explores the domain of active localization, emphasizing the importance of viewpoint selection to enhance localization accuracy. Our contributions involve using a data-driven approach with a simple architecture designed for real-time operation, a self-supervised data training method, and the capability to consistently integrate our map into a planning framework tailored for real-world robotics applications.
arXiv Detail & Related papers (2024-07-22T12:32:09Z)
ELDEN: Exploration via Local Dependencies [37.44189774149647]
We present ELDEN, Exploration via Local DepENdencies, a novel intrinsic reward that encourages the discovery of new interactions between entities. We evaluate the performance of ELDEN on four different domains with complex dependencies, ranging from 2D grid worlds to 3D robotic tasks.
arXiv Detail & Related papers (2023-10-12T20:20:21Z)
Learning Continuous Control Policies for Information-Theoretic Active Perception [24.297016904005257]
We tackle the problem of learning a control policy that maximizes the mutual information between the landmark states and the sensor observations. We employ a Kalman filter to convert the partially observable problem in the landmark state to Markov decision process (MDP), a differentiable field of view to shape the reward, and an attention-based neural network to represent the control policy.
arXiv Detail & Related papers (2022-09-26T05:28:32Z)
Local Explanations for Reinforcement Learning [14.87922813917482]
We propose a novel perspective to understanding RL policies based on identifying important states from automatically learned meta-states. We show that our algorithm to find meta-states converges and the objective that selects important states from each meta-state is submodular leading to efficient high quality greedy selection.
arXiv Detail & Related papers (2022-02-08T02:02:09Z)
MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven Reinforcement Learning [65.52675802289775]
We show that an uncertainty aware classifier can solve challenging reinforcement learning problems. We propose a novel method for computing the normalized maximum likelihood (NML) distribution. We show that the resulting algorithm has a number of intriguing connections to both count-based exploration methods and prior algorithms for learning reward functions.
arXiv Detail & Related papers (2021-07-15T08:19:57Z)
Feature-Based Interpretable Reinforcement Learning based on State-Transition Models [3.883460584034766]
Growing concerns regarding the operational usage of AI models in the real-world has caused a surge of interest in explaining AI models' decisions to humans. We propose a method for offering local explanations on risk in reinforcement learning.
arXiv Detail & Related papers (2021-05-14T23:43:11Z)
A New Bandit Setting Balancing Information from State Evolution and Corrupted Context [52.67844649650687]
We propose a new sequential decision-making setting combining key aspects of two established online learning problems with bandit feedback. The optimal action to play at any given moment is contingent on an underlying changing state which is not directly observable by the agent. We present an algorithm that uses a referee to dynamically combine the policies of a contextual bandit and a multi-armed bandit.
arXiv Detail & Related papers (2020-11-16T14:35:37Z)
Guided Uncertainty-Aware Policy Optimization: Combining Learning and Model-Based Strategies for Sample-Efficient Policy Learning [75.56839075060819]
Traditional robotic approaches rely on an accurate model of the environment, a detailed description of how to perform the task, and a robust perception system to keep track of the current state. reinforcement learning approaches can operate directly from raw sensory inputs with only a reward signal to describe the task, but are extremely sample-inefficient and brittle. In this work, we combine the strengths of model-based methods with the flexibility of learning-based methods to obtain a general method that is able to overcome inaccuracies in the robotics perception/actuation pipeline.
arXiv Detail & Related papers (2020-05-21T19:47:05Z)
Learning Discrete State Abstractions With Deep Variational Inference [7.273663549650618]
We propose a method for learning approximate bisimulations, a type of state abstraction. We use a deep neural encoder to map states onto continuous embeddings. We map these embeddings onto a discrete representation using an action-conditioned hidden Markov model.
arXiv Detail & Related papers (2020-03-09T17:58:27Z)
Mutual Information-based State-Control for Intrinsically Motivated Reinforcement Learning [102.05692309417047]
In reinforcement learning, an agent learns to reach a set of goals by means of an external reward signal. In the natural world, intelligent organisms learn from internal drives, bypassing the need for external signals. We propose to formulate an intrinsic objective as the mutual information between the goal states and the controllable states.
arXiv Detail & Related papers (2020-02-05T19:21:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.