ELEMENT: Episodic and Lifelong Exploration via Maximum Entropy
- URL: http://arxiv.org/abs/2412.03800v1
- Date: Thu, 05 Dec 2024 01:42:13 GMT
- Title: ELEMENT: Episodic and Lifelong Exploration via Maximum Entropy
- Authors: Hongming Li, Shujian Yu, Bin Liu, Jose C. Principe,
- Abstract summary: EmphEpisodic and Lifelong Exploration via ENTropy (ELEMENT) is a novel, multiscale, intrinsically motivated reinforcement learning framework.
We propose a novel intrinsic reward for episodic entropy named emphaverage episodic state entropy which provides the optimal solution for a theoretical upper bound.
Our significantly outperforms state-of-the-art intrinsic rewards in both episodic and lifelong setups.
- Score: 21.586240279091815
- License:
- Abstract: This paper proposes \emph{Episodic and Lifelong Exploration via Maximum ENTropy} (ELEMENT), a novel, multiscale, intrinsically motivated reinforcement learning (RL) framework that is able to explore environments without using any extrinsic reward and transfer effectively the learned skills to downstream tasks. We advance the state of the art in three ways. First, we propose a multiscale entropy optimization to take care of the fact that previous maximum state entropy, for lifelong exploration with millions of state observations, suffers from vanishing rewards and becomes very expensive computationally across iterations. Therefore, we add an episodic maximum entropy over each episode to speedup the search further. Second, we propose a novel intrinsic reward for episodic entropy maximization named \emph{average episodic state entropy} which provides the optimal solution for a theoretical upper bound of the episodic state entropy objective. Third, to speed the lifelong entropy maximization, we propose a $k$ nearest neighbors ($k$NN) graph to organize the estimation of the entropy and updating processes that reduces the computation substantially. Our ELEMENT significantly outperforms state-of-the-art intrinsic rewards in both episodic and lifelong setups. Moreover, it can be exploited in task-agnostic pre-training, collecting data for offline reinforcement learning, etc.
Related papers
- MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization [91.80034860399677]
Reinforcement learning algorithms aim to balance exploiting the current best strategy with exploring new options that could lead to higher rewards.
We introduce a framework, MaxInfoRL, for balancing intrinsic and extrinsic exploration.
We show that our approach achieves sublinear regret in the simplified setting of multi-armed bandits.
arXiv Detail & Related papers (2024-12-16T18:59:53Z) - How to Explore with Belief: State Entropy Maximization in POMDPs [40.82741665804367]
We develop a memory and efficient *policy* method to address a first-order relaxation of the objective defined on ** states.
This paper aims to generalize state entropy to more realistic domains that meet the challenges of applications.
arXiv Detail & Related papers (2024-06-04T13:16:34Z) - Fast Rates for Maximum Entropy Exploration [52.946307632704645]
We address the challenge of exploration in reinforcement learning (RL) when the agent operates in an unknown environment with sparse or no rewards.
We study the maximum entropy exploration problem two different types.
For visitation entropy, we propose a game-theoretic algorithm that has $widetildemathcalO(H3S2A/varepsilon2)$ sample complexity.
For the trajectory entropy, we propose a simple algorithm that has a sample of complexity of order $widetildemathcalO(mathrmpoly(S,
arXiv Detail & Related papers (2023-03-14T16:51:14Z) - SHIRO: Soft Hierarchical Reinforcement Learning [0.0]
We present an Off-Policy HRL algorithm that maximizes entropy for efficient exploration.
The algorithm learns a temporally abstracted low-level policy and is able to explore broadly through the addition of entropy to the high-level.
Our method, SHIRO, surpasses state-of-the-art performance on a range of simulated robotic control benchmark tasks.
arXiv Detail & Related papers (2022-12-24T17:21:58Z) - k-Means Maximum Entropy Exploration [55.81894038654918]
Exploration in continuous spaces with sparse rewards is an open problem in reinforcement learning.
We introduce an artificial curiosity algorithm based on lower bounding an approximation to the entropy of the state visitation distribution.
We show that our approach is both computationally efficient and competitive on benchmarks for exploration in high-dimensional, continuous spaces.
arXiv Detail & Related papers (2022-05-31T09:05:58Z) - APS: Active Pretraining with Successor Features [96.24533716878055]
We show that by reinterpreting and combining successorcitepHansenFast with non entropy, the intractable mutual information can be efficiently optimized.
The proposed method Active Pretraining with Successor Feature (APS) explores the environment via non entropy, and the explored data can be efficiently leveraged to learn behavior.
arXiv Detail & Related papers (2021-08-31T16:30:35Z) - Action Redundancy in Reinforcement Learning [54.291331971813364]
We show that transition entropy can be described by two terms; namely, model-dependent transition entropy and action redundancy.
Our results suggest that action redundancy is a fundamental problem in reinforcement learning.
arXiv Detail & Related papers (2021-02-22T19:47:26Z) - State Entropy Maximization with Random Encoders for Efficient
Exploration [162.39202927681484]
Recent exploration methods have proven to be a recipe for improving sample-efficiency in deep reinforcement learning (RL)
This paper presents Randoms for Efficient Exploration (RE3), an exploration method that utilizes state entropy as an intrinsic reward.
In particular, we find that the state entropy can be estimated in a stable and compute-efficient manner by utilizing a randomly encoder.
arXiv Detail & Related papers (2021-02-18T15:45:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.