Mutual Information-based State-Control for Intrinsically Motivated
Reinforcement Learning
- URL: http://arxiv.org/abs/2002.01963v2
- Date: Fri, 12 Jun 2020 19:07:38 GMT
- Title: Mutual Information-based State-Control for Intrinsically Motivated
Reinforcement Learning
- Authors: Rui Zhao, Yang Gao, Pieter Abbeel, Volker Tresp, Wei Xu
- Abstract summary: In reinforcement learning, an agent learns to reach a set of goals by means of an external reward signal.
In the natural world, intelligent organisms learn from internal drives, bypassing the need for external signals.
We propose to formulate an intrinsic objective as the mutual information between the goal states and the controllable states.
- Score: 102.05692309417047
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In reinforcement learning, an agent learns to reach a set of goals by means
of an external reward signal. In the natural world, intelligent organisms learn
from internal drives, bypassing the need for external signals, which is
beneficial for a wide range of tasks. Motivated by this observation, we propose
to formulate an intrinsic objective as the mutual information between the goal
states and the controllable states. This objective encourages the agent to take
control of its environment. Subsequently, we derive a surrogate objective of
the proposed reward function, which can be optimized efficiently. Lastly, we
evaluate the developed framework in different robotic manipulation and
navigation tasks and demonstrate the efficacy of our approach. A video showing
experimental results is available at https://youtu.be/CT4CKMWBYz0
Related papers
- Localizing Active Objects from Egocentric Vision with Symbolic World
Knowledge [62.981429762309226]
The ability to actively ground task instructions from an egocentric view is crucial for AI agents to accomplish tasks or assist humans virtually.
We propose to improve phrase grounding models' ability on localizing the active objects by: learning the role of objects undergoing change and extracting them accurately from the instructions.
We evaluate our framework on Ego4D and Epic-Kitchens datasets.
arXiv Detail & Related papers (2023-10-23T16:14:05Z) - Walk the Random Walk: Learning to Discover and Reach Goals Without
Supervision [21.72567982148215]
We propose a novel method for training such a goal-conditioned agent without any external rewards or any domain knowledge.
We use random walk to train a reachability network that predicts the similarity between two states.
This reachability network is then used in building goal memory containing past observations that are diverse and well-balanced.
All the components are kept updated throughout training as the agent discovers and learns new goals.
arXiv Detail & Related papers (2022-06-23T14:29:36Z) - Information is Power: Intrinsic Control via Information Capture [110.3143711650806]
We argue that a compact and general learning objective is to minimize the entropy of the agent's state visitation estimated using a latent state-space model.
This objective induces an agent to both gather information about its environment, corresponding to reducing uncertainty, and to gain control over its environment, corresponding to reducing the unpredictability of future world states.
arXiv Detail & Related papers (2021-12-07T18:50:42Z) - Goal-Aware Cross-Entropy for Multi-Target Reinforcement Learning [15.33496710690063]
We propose goal-aware cross-entropy (GACE) loss, that can be utilized in a self-supervised way.
We then devise goal-discriminative attention networks (GDAN) which utilize the goal-relevant information to focus on the given instruction.
arXiv Detail & Related papers (2021-10-25T14:24:39Z) - Contrastive Active Inference [12.361539023886161]
We propose a contrastive objective for active inference that reduces the computational burden in learning the agent's generative model and planning future actions.
Our method performs notably better than likelihood-based active inference in image-based tasks, while also being computationally cheaper and easier to train.
arXiv Detail & Related papers (2021-10-19T16:20:49Z) - Adversarial Intrinsic Motivation for Reinforcement Learning [60.322878138199364]
We investigate whether the Wasserstein-1 distance between a policy's state visitation distribution and a target distribution can be utilized effectively for reinforcement learning tasks.
Our approach, termed Adversarial Intrinsic Motivation (AIM), estimates this Wasserstein-1 distance through its dual objective and uses it to compute a supplemental reward function.
arXiv Detail & Related papers (2021-05-27T17:51:34Z) - Action and Perception as Divergence Minimization [43.75550755678525]
Action Perception Divergence is an approach for categorizing the space of possible objective functions for embodied agents.
We show a spectrum that reaches from narrow to general objectives.
These agents use perception to align their beliefs with the world and use actions to align the world with their beliefs.
arXiv Detail & Related papers (2020-09-03T16:52:46Z) - Automatic Curriculum Learning through Value Disagreement [95.19299356298876]
Continually solving new, unsolved tasks is the key to learning diverse behaviors.
In the multi-task domain, where an agent needs to reach multiple goals, the choice of training goals can largely affect sample efficiency.
We propose setting up an automatic curriculum for goals that the agent needs to solve.
We evaluate our method across 13 multi-goal robotic tasks and 5 navigation tasks, and demonstrate performance gains over current state-of-the-art methods.
arXiv Detail & Related papers (2020-06-17T03:58:25Z) - Intrinsic Motivation for Encouraging Synergistic Behavior [55.10275467562764]
We study the role of intrinsic motivation as an exploration bias for reinforcement learning in sparse-reward synergistic tasks.
Our key idea is that a good guiding principle for intrinsic motivation in synergistic tasks is to take actions which affect the world in ways that would not be achieved if the agents were acting on their own.
arXiv Detail & Related papers (2020-02-12T19:34:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.