Surprise-Adaptive Intrinsic Motivation for Unsupervised Reinforcement Learning
- URL: http://arxiv.org/abs/2405.17243v2
- Date: Fri, 16 Aug 2024 17:55:32 GMT
- Title: Surprise-Adaptive Intrinsic Motivation for Unsupervised Reinforcement Learning
- Authors: Adriana Hugessen, Roger Creus Castanyer, Faisal Mohamed, Glen Berseth,
- Abstract summary: entropy-minimizing and entropy-maximizing objectives for unsupervised reinforcement learning (RL) have been shown to be effective in different environments.
We propose an agent that can adapt its objective online, depending on the entropy conditions by framing the choice as a multi-armed bandit problem.
We demonstrate that such agents can learn to control entropy and exhibit emergent behaviors in both high- and low-entropy regimes.
- Score: 6.937243101289336
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Both entropy-minimizing and entropy-maximizing (curiosity) objectives for unsupervised reinforcement learning (RL) have been shown to be effective in different environments, depending on the environment's level of natural entropy. However, neither method alone results in an agent that will consistently learn intelligent behavior across environments. In an effort to find a single entropy-based method that will encourage emergent behaviors in any environment, we propose an agent that can adapt its objective online, depending on the entropy conditions by framing the choice as a multi-armed bandit problem. We devise a novel intrinsic feedback signal for the bandit, which captures the agent's ability to control the entropy in its environment. We demonstrate that such agents can learn to control entropy and exhibit emergent behaviors in both high- and low-entropy regimes and can learn skillful behaviors in benchmark tasks. Videos of the trained agents and summarized findings can be found on our project page https://sites.google.com/view/surprise-adaptive-agents
Related papers
- Predictable Reinforcement Learning Dynamics through Entropy Rate
Minimization [17.845518684835913]
In Reinforcement Learning (RL), agents have no incentive to exhibit predictable behaviors.
We propose a novel method to induce predictable behavior in RL agents, referred to as Predictability-Aware RL (PA-RL)
We show how the entropy rate can be formulated as an average reward objective, and since its entropy reward function is policy-dependent, we introduce an action-dependent surrogate entropy.
arXiv Detail & Related papers (2023-11-30T16:53:32Z) - Safer Autonomous Driving in a Stochastic, Partially-Observable
Environment by Hierarchical Contingency Planning [10.971411555103574]
An intelligent agent should be prepared to anticipate a change in its belief of the environment state.
This is especially the case for autonomous vehicles (AVs) navigating real-world situations where safety is paramount.
We show that our approach results in robust, safe behaviour in a partially observable, safe environment, generalizing well over environment not seen during training.
arXiv Detail & Related papers (2022-04-13T16:47:00Z) - Information is Power: Intrinsic Control via Information Capture [110.3143711650806]
We argue that a compact and general learning objective is to minimize the entropy of the agent's state visitation estimated using a latent state-space model.
This objective induces an agent to both gather information about its environment, corresponding to reducing uncertainty, and to gain control over its environment, corresponding to reducing the unpredictability of future world states.
arXiv Detail & Related papers (2021-12-07T18:50:42Z) - On Assessing The Safety of Reinforcement Learning algorithms Using
Formal Methods [6.2822673562306655]
Safety mechanisms such as adversarial training, adversarial detection, and robust learning are not always adapted to all disturbances in which the agent is deployed.
It is therefore necessary to propose new solutions adapted to the learning challenges faced by the agent.
We use reward shaping and a modified Q-learning algorithm as defense mechanisms to improve the agent's policy when facing adversarial perturbations.
arXiv Detail & Related papers (2021-11-08T23:08:34Z) - Explore and Control with Adversarial Surprise [78.41972292110967]
Reinforcement learning (RL) provides a framework for learning goal-directed policies given user-specified rewards.
We propose a new unsupervised RL technique based on an adversarial game which pits two policies against each other to compete over the amount of surprise an RL agent experiences.
We show that our method leads to the emergence of complex skills by exhibiting clear phase transitions.
arXiv Detail & Related papers (2021-07-12T17:58:40Z) - Action Redundancy in Reinforcement Learning [54.291331971813364]
We show that transition entropy can be described by two terms; namely, model-dependent transition entropy and action redundancy.
Our results suggest that action redundancy is a fundamental problem in reinforcement learning.
arXiv Detail & Related papers (2021-02-22T19:47:26Z) - Detecting Rewards Deterioration in Episodic Reinforcement Learning [63.49923393311052]
In many RL applications, once training ends, it is vital to detect any deterioration in the agent performance as soon as possible.
We consider an episodic framework, where the rewards within each episode are not independent, nor identically-distributed, nor Markov.
We define the mean-shift in a way corresponding to deterioration of a temporal signal (such as the rewards), and derive a test for this problem with optimal statistical power.
arXiv Detail & Related papers (2020-10-22T12:45:55Z) - Ecological Reinforcement Learning [76.9893572776141]
We study the kinds of environment properties that can make learning under such conditions easier.
understanding how properties of the environment impact the performance of reinforcement learning agents can help us to structure our tasks in ways that make learning tractable.
arXiv Detail & Related papers (2020-06-22T17:55:03Z) - Maximizing Information Gain in Partially Observable Environments via
Prediction Reward [64.24528565312463]
This paper tackles the challenge of using belief-based rewards for a deep RL agent.
We derive the exact error between negative entropy and the expected prediction reward.
This insight provides theoretical motivation for several fields using prediction rewards.
arXiv Detail & Related papers (2020-05-11T08:13:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.