Action and Perception as Divergence Minimization
- URL: http://arxiv.org/abs/2009.01791v3
- Date: Sun, 13 Feb 2022 02:40:42 GMT
- Title: Action and Perception as Divergence Minimization
- Authors: Danijar Hafner, Pedro A. Ortega, Jimmy Ba, Thomas Parr, Karl Friston,
Nicolas Heess
- Abstract summary: Action Perception Divergence is an approach for categorizing the space of possible objective functions for embodied agents.
We show a spectrum that reaches from narrow to general objectives.
These agents use perception to align their beliefs with the world and use actions to align the world with their beliefs.
- Score: 43.75550755678525
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To learn directed behaviors in complex environments, intelligent agents need
to optimize objective functions. Various objectives are known for designing
artificial agents, including task rewards and intrinsic motivation. However, it
is unclear how the known objectives relate to each other, which objectives
remain yet to be discovered, and which objectives better describe the behavior
of humans. We introduce the Action Perception Divergence (APD), an approach for
categorizing the space of possible objective functions for embodied agents. We
show a spectrum that reaches from narrow to general objectives. While the
narrow objectives correspond to domain-specific rewards as typical in
reinforcement learning, the general objectives maximize information with the
environment through latent variable models of input sequences. Intuitively,
these agents use perception to align their beliefs with the world and use
actions to align the world with their beliefs. They infer representations that
are informative of past inputs, explore future inputs that are informative of
their representations, and select actions or skills that maximally influence
future inputs. This explains a wide range of unsupervised objectives from a
single principle, including representation learning, information gain,
empowerment, and skill discovery. Our findings suggest leveraging powerful
world models for unsupervised exploration as a path toward highly adaptive
agents that seek out large niches in their environments, rendering task rewards
optional.
Related papers
- Curious Exploration via Structured World Models Yields Zero-Shot Object
Manipulation [19.840186443344]
We propose to use structured world models to incorporate inductive biases in the control loop to achieve sample-efficient exploration.
Our method generates free-play behavior that starts to interact with objects early on and develops more complex behavior over time.
arXiv Detail & Related papers (2022-06-22T22:08:50Z) - Deep Hierarchical Planning from Pixels [86.14687388689204]
Director is a method for learning hierarchical behaviors directly from pixels by planning inside the latent space of a learned world model.
Despite operating in latent space, the decisions are interpretable because the world model can decode goals into images for visualization.
Director also learns successful behaviors across a wide range of environments, including visual control, Atari games, and DMLab levels.
arXiv Detail & Related papers (2022-06-08T18:20:15Z) - Information is Power: Intrinsic Control via Information Capture [110.3143711650806]
We argue that a compact and general learning objective is to minimize the entropy of the agent's state visitation estimated using a latent state-space model.
This objective induces an agent to both gather information about its environment, corresponding to reducing uncertainty, and to gain control over its environment, corresponding to reducing the unpredictability of future world states.
arXiv Detail & Related papers (2021-12-07T18:50:42Z) - Goal-Aware Cross-Entropy for Multi-Target Reinforcement Learning [15.33496710690063]
We propose goal-aware cross-entropy (GACE) loss, that can be utilized in a self-supervised way.
We then devise goal-discriminative attention networks (GDAN) which utilize the goal-relevant information to focus on the given instruction.
arXiv Detail & Related papers (2021-10-25T14:24:39Z) - Understanding the origin of information-seeking exploration in
probabilistic objectives for control [62.997667081978825]
An exploration-exploitation trade-off is central to the description of adaptive behaviour.
One approach to solving this trade-off has been to equip or propose that agents possess an intrinsic 'exploratory drive'
We show that this combination of utility maximizing and information-seeking behaviour arises from the minimization of an entirely difference class of objectives.
arXiv Detail & Related papers (2021-03-11T18:42:39Z) - Automatic Curriculum Learning through Value Disagreement [95.19299356298876]
Continually solving new, unsolved tasks is the key to learning diverse behaviors.
In the multi-task domain, where an agent needs to reach multiple goals, the choice of training goals can largely affect sample efficiency.
We propose setting up an automatic curriculum for goals that the agent needs to solve.
We evaluate our method across 13 multi-goal robotic tasks and 5 navigation tasks, and demonstrate performance gains over current state-of-the-art methods.
arXiv Detail & Related papers (2020-06-17T03:58:25Z) - Mutual Information-based State-Control for Intrinsically Motivated
Reinforcement Learning [102.05692309417047]
In reinforcement learning, an agent learns to reach a set of goals by means of an external reward signal.
In the natural world, intelligent organisms learn from internal drives, bypassing the need for external signals.
We propose to formulate an intrinsic objective as the mutual information between the goal states and the controllable states.
arXiv Detail & Related papers (2020-02-05T19:21:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.