Related papers: Bounded Exploration with World Model Uncertainty in Soft Actor-Critic Reinforcement Learning Algorithm

Bounded Exploration with World Model Uncertainty in Soft Actor-Critic Reinforcement Learning Algorithm

URL: http://arxiv.org/abs/2412.06139v1
Date: Mon, 09 Dec 2024 01:45:08 GMT
Title: Bounded Exploration with World Model Uncertainty in Soft Actor-Critic Reinforcement Learning Algorithm
Authors: Ting Qiao, Henry Williams, David Valencia, Bruce MacDonald,
Abstract summary: Bounded exploration is a novel exploration method that integrates both'soft' and intrinsic motivation exploration.<n>It notably improved the Soft Actor-Critic algorithm's performance and its model-based extension's converging speed.
Score: 0.6749750044497732
License: http://creativecommons.org/licenses/by/4.0/
Abstract: One of the bottlenecks preventing Deep Reinforcement Learning algorithms (DRL) from real-world applications is how to explore the environment and collect informative transitions efficiently. The present paper describes bounded exploration, a novel exploration method that integrates both 'soft' and intrinsic motivation exploration. Bounded exploration notably improved the Soft Actor-Critic algorithm's performance and its model-based extension's converging speed. It achieved the highest score in 6 out of 8 experiments. Bounded exploration presents an alternative method to introduce intrinsic motivations to exploration when the original reward function has strict meanings.

Related papers

MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization [91.80034860399677]
Reinforcement learning algorithms aim to balance exploiting the current best strategy with exploring new options that could lead to higher rewards. We introduce a framework, MaxInfoRL, for balancing intrinsic and extrinsic exploration. We show that our approach achieves sublinear regret in the simplified setting of multi-armed bandits.
arXiv Detail & Related papers (2024-12-16T18:59:53Z)
Random Latent Exploration for Deep Reinforcement Learning [71.88709402926415]
This paper introduces a new exploration technique called Random Latent Exploration (RLE) RLE combines the strengths of bonus-based and noise-based (two popular approaches for effective exploration in deep RL) exploration strategies. We evaluate it on the challenging Atari and IsaacGym benchmarks and show that RLE exhibits higher overall scores across all the tasks than other approaches.
arXiv Detail & Related papers (2024-07-18T17:55:22Z)
Strangeness-driven Exploration in Multi-Agent Reinforcement Learning [0.0]
We introduce a new exploration method with the strangeness that can be easily incorporated into any centralized training and decentralized execution (CTDE)-based MARL algorithms. The exploration bonus is obtained from the strangeness and the proposed exploration method is not much affected by transitions commonly observed in MARL tasks.
arXiv Detail & Related papers (2022-12-27T11:08:49Z)
Deep Intrinsically Motivated Exploration in Continuous Control [0.0]
In continuous systems, exploration is often performed through undirected strategies in which parameters of the networks or selected actions are perturbed by random noise. We adapt existing theories on animal motivational systems into the reinforcement learning paradigm and introduce a novel directed exploration strategy. Our framework extends to larger and more diverse state spaces, dramatically improves the baselines, and outperforms the undirected strategies significantly.
arXiv Detail & Related papers (2022-10-01T14:52:16Z)
Boosting Exploration in Actor-Critic Algorithms by Incentivizing Plausible Novel States [9.210923191081864]
Actor-critic (AC) algorithms are a class of model-free deep reinforcement learning algorithms. We propose a new method to boost exploration through an intrinsic reward, based on measurement of a state's novelty. With incentivized exploration of plausible novel states, an AC algorithm is able to improve its sample efficiency and hence training performance.
arXiv Detail & Related papers (2022-10-01T07:07:11Z)
Rewarding Episodic Visitation Discrepancy for Exploration in Reinforcement Learning [64.8463574294237]
We propose Rewarding Episodic Visitation Discrepancy (REVD) as an efficient and quantified exploration method. REVD provides intrinsic rewards by evaluating the R'enyi divergence-based visitation discrepancy between episodes. It is tested on PyBullet Robotics Environments and Atari games.
arXiv Detail & Related papers (2022-09-19T08:42:46Z)
Discovering and Exploiting Sparse Rewards in a Learned Behavior Space [0.46736439782713946]
Learning optimal policies in sparse rewards settings is difficult as the learning agent has little to no feedback on the quality of its actions. We introduce STAX, an algorithm designed to learn a behavior space on-the-fly and to explore it while efficiently optimizing any reward discovered.
arXiv Detail & Related papers (2021-11-02T22:21:11Z)
MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven Reinforcement Learning [65.52675802289775]
We show that an uncertainty aware classifier can solve challenging reinforcement learning problems. We propose a novel method for computing the normalized maximum likelihood (NML) distribution. We show that the resulting algorithm has a number of intriguing connections to both count-based exploration methods and prior algorithms for learning reward functions.
arXiv Detail & Related papers (2021-07-15T08:19:57Z)
MADE: Exploration via Maximizing Deviation from Explored Regions [48.49228309729319]
In online reinforcement learning (RL), efficient exploration remains challenging in high-dimensional environments with sparse rewards. We propose a new exploration approach via textitmaximizing the deviation of the occupancy of the next policy from the explored regions. Our approach significantly improves sample efficiency over state-of-the-art methods.
arXiv Detail & Related papers (2021-06-18T17:57:00Z)
Reannealing of Decaying Exploration Based On Heuristic Measure in Deep Q-Network [82.20059754270302]
We propose an algorithm based on the idea of reannealing, that aims at encouraging exploration only when it is needed. We perform an illustrative case study showing that it has potential to both accelerate training and obtain a better policy.
arXiv Detail & Related papers (2020-09-29T20:40:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.