Maximizing Information Gain in Partially Observable Environments via
Prediction Reward
- URL: http://arxiv.org/abs/2005.04912v1
- Date: Mon, 11 May 2020 08:13:49 GMT
- Title: Maximizing Information Gain in Partially Observable Environments via
Prediction Reward
- Authors: Yash Satsangi, Sungsu Lim, Shimon Whiteson, Frans Oliehoek, Martha
White
- Abstract summary: This paper tackles the challenge of using belief-based rewards for a deep RL agent.
We derive the exact error between negative entropy and the expected prediction reward.
This insight provides theoretical motivation for several fields using prediction rewards.
- Score: 64.24528565312463
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Information gathering in a partially observable environment can be formulated
as a reinforcement learning (RL), problem where the reward depends on the
agent's uncertainty. For example, the reward can be the negative entropy of the
agent's belief over an unknown (or hidden) variable. Typically, the rewards of
an RL agent are defined as a function of the state-action pairs and not as a
function of the belief of the agent; this hinders the direct application of
deep RL methods for such tasks. This paper tackles the challenge of using
belief-based rewards for a deep RL agent, by offering a simple insight that
maximizing any convex function of the belief of the agent can be approximated
by instead maximizing a prediction reward: a reward based on prediction
accuracy. In particular, we derive the exact error between negative entropy and
the expected prediction reward. This insight provides theoretical motivation
for several fields using prediction rewards---namely visual attention, question
answering systems, and intrinsic motivation---and highlights their connection
to the usually distinct fields of active perception, active sensing, and sensor
placement. Based on this insight we present deep anticipatory networks (DANs),
which enables an agent to take actions to reduce its uncertainty without
performing explicit belief inference. We present two applications of DANs:
building a sensor selection system for tracking people in a shopping mall and
learning discrete models of attention on fashion MNIST and MNIST digit
classification.
Related papers
- Explaining an Agent's Future Beliefs through Temporally Decomposing Future Reward Estimators [5.642469620531317]
We modify an agent's future reward estimator to predict their next N expected rewards, referred to as Temporal Reward Decomposition (TRD)
We can: estimate when an agent may expect to receive a reward, the value of the reward and the agent's confidence in receiving it; measure an input feature's temporal importance to the agent's action decisions; and predict the influence of different actions on future rewards.
We show that DQN agents trained on Atari environments can be efficiently retrained to incorporate TRD with minimal impact on performance.
arXiv Detail & Related papers (2024-08-15T15:56:15Z) - Intrinsic Rewards for Exploration without Harm from Observational Noise: A Simulation Study Based on the Free Energy Principle [3.6985126664461037]
In Reinforcement Learning (RL), artificial agents are trained to maximize numerical rewards by performing tasks.
This paper proposes hidden state curiosity, which rewards agents by the KL divergence between the predictive prior and posterior probabilities of latent variables.
arXiv Detail & Related papers (2024-05-13T05:18:23Z) - Predictable Reinforcement Learning Dynamics through Entropy Rate
Minimization [17.845518684835913]
In Reinforcement Learning (RL), agents have no incentive to exhibit predictable behaviors.
We propose a novel method to induce predictable behavior in RL agents, referred to as Predictability-Aware RL (PA-RL)
We show how the entropy rate can be formulated as an average reward objective, and since its entropy reward function is policy-dependent, we introduce an action-dependent surrogate entropy.
arXiv Detail & Related papers (2023-11-30T16:53:32Z) - Leveraging Reward Consistency for Interpretable Feature Discovery in
Reinforcement Learning [69.19840497497503]
It is argued that the commonly used action matching principle is more like an explanation of deep neural networks (DNNs) than the interpretation of RL agents.
We propose to consider rewards, the essential objective of RL agents, as the essential objective of interpreting RL agents.
We verify and evaluate our method on the Atari 2600 games as well as Duckietown, a challenging self-driving car simulator environment.
arXiv Detail & Related papers (2023-09-04T09:09:54Z) - Estimating and Incentivizing Imperfect-Knowledge Agents with Hidden
Rewards [4.742123770879715]
In practice, incentive providers often cannot observe the reward realizations of incentivized agents.
This paper explores a repeated adverse selection game between a self-interested learning agent and a learning principal.
We introduce an estimator whose only input is the history of principal's incentives and agent's choices.
arXiv Detail & Related papers (2023-08-13T08:12:01Z) - What Should I Know? Using Meta-gradient Descent for Predictive Feature
Discovery in a Single Stream of Experience [63.75363908696257]
computational reinforcement learning seeks to construct an agent's perception of the world through predictions of future sensations.
An open challenge in this line of work is determining from the infinitely many predictions that the agent could possibly make which predictions might best support decision-making.
We introduce a meta-gradient descent process by which an agent learns what predictions to make, 2) the estimates for its chosen predictions, and 3) how to use those estimates to generate policies that maximize future reward.
arXiv Detail & Related papers (2022-06-13T21:31:06Z) - The Effects of Reward Misspecification: Mapping and Mitigating
Misaligned Models [85.68751244243823]
Reward hacking -- where RL agents exploit gaps in misspecified reward functions -- has been widely observed, but not yet systematically studied.
We investigate reward hacking as a function of agent capabilities: model capacity, action space resolution, observation space noise, and training time.
We find instances of phase transitions: capability thresholds at which the agent's behavior qualitatively shifts, leading to a sharp decrease in the true reward.
arXiv Detail & Related papers (2022-01-10T18:58:52Z) - Experimental Evidence that Empowerment May Drive Exploration in
Sparse-Reward Environments [0.0]
An intrinsic reward function based on the principle of empowerment assigns rewards proportional to the amount of control the agent has over its own sensors.
We implement a variation on a recently proposed intrinsically motivated agent, which we refer to as the 'curious' agent, and an empowerment-inspired agent.
We compare the performance of both agents to that of an advantage actor-critic baseline in four sparse reward grid worlds.
arXiv Detail & Related papers (2021-07-14T22:52:38Z) - Explore and Control with Adversarial Surprise [78.41972292110967]
Reinforcement learning (RL) provides a framework for learning goal-directed policies given user-specified rewards.
We propose a new unsupervised RL technique based on an adversarial game which pits two policies against each other to compete over the amount of surprise an RL agent experiences.
We show that our method leads to the emergence of complex skills by exhibiting clear phase transitions.
arXiv Detail & Related papers (2021-07-12T17:58:40Z) - Noisy Agents: Self-supervised Exploration by Predicting Auditory Events [127.82594819117753]
We propose a novel type of intrinsic motivation for Reinforcement Learning (RL) that encourages the agent to understand the causal effect of its actions.
We train a neural network to predict the auditory events and use the prediction errors as intrinsic rewards to guide RL exploration.
Experimental results on Atari games show that our new intrinsic motivation significantly outperforms several state-of-the-art baselines.
arXiv Detail & Related papers (2020-07-27T17:59:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.