A Max-Min Entropy Framework for Reinforcement Learning
- URL: http://arxiv.org/abs/2106.10517v1
- Date: Sat, 19 Jun 2021 15:30:21 GMT
- Title: A Max-Min Entropy Framework for Reinforcement Learning
- Authors: Seungyul Han and Youngchul Sung
- Abstract summary: We propose a max-min entropy framework for reinforcement learning (RL) to overcome the limitation of the maximum entropy RL framework.
For general Markov decision processes (MDPs), an efficient algorithm is constructed under the proposed max-min entropy framework.
Numerical results show that the proposed algorithm yields drastic performance improvement over the current state-of-the-art RL algorithms.
- Score: 16.853711292804476
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose a max-min entropy framework for reinforcement
learning (RL) to overcome the limitation of the maximum entropy RL framework in
model-free sample-based learning. Whereas the maximum entropy RL framework
guides learning for policies to reach states with high entropy in the future,
the proposed max-min entropy framework aims to learn to visit states with low
entropy and maximize the entropy of these low-entropy states to promote
exploration. For general Markov decision processes (MDPs), an efficient
algorithm is constructed under the proposed max-min entropy framework based on
disentanglement of exploration and exploitation. Numerical results show that
the proposed algorithm yields drastic performance improvement over the current
state-of-the-art RL algorithms.
Related papers
- Maximum Entropy On-Policy Actor-Critic via Entropy Advantage Estimation [0.276240219662896]
A notable form of entropy regularisation is augmenting the objective with an entropy term, thereby simultaneously optimising the expected return and the entropy.
This framework, known as maximum entropy reinforcement learning (MaxEnt RL), has shown theoretical and empirical successes.
This paper proposes a simple method of separating the entropy objective from the MaxEnt RL objective, which facilitates the implementation of MaxEnt RL in on-policy settings.
arXiv Detail & Related papers (2024-07-25T15:48:24Z) - The Limits of Pure Exploration in POMDPs: When the Observation Entropy is Enough [40.82741665804367]
We study a simple approach of maximizing the entropy over observations in place true latent states.
We show how knowledge of the latter can be exploited to compute a regularization of the observation entropy to improve principled performance.
arXiv Detail & Related papers (2024-06-18T17:00:13Z) - Entropy-Regularized Token-Level Policy Optimization for Language Agent Reinforcement [67.1393112206885]
Large Language Models (LLMs) have shown promise as intelligent agents in interactive decision-making tasks.
We introduce Entropy-Regularized Token-level Policy Optimization (ETPO), an entropy-augmented RL method tailored for optimizing LLMs at the token level.
We assess the effectiveness of ETPO within a simulated environment that models data science code generation as a series of multi-step interactive tasks.
arXiv Detail & Related papers (2024-02-09T07:45:26Z) - Maximum Causal Entropy Inverse Reinforcement Learning for Mean-Field
Games [3.2228025627337864]
We introduce the casual entropy Inverse Reinforcement (IRL) problem for discrete-time mean-field games (MFGs) under an infinite-horizon discounted-reward optimality criterion.
We present by formulating the MFG problem as a generalized Nash equilibrium problem (GN), which is capable of computing the meanfield equilibrium problem.
This method is employed to produce data for a numerical example.
arXiv Detail & Related papers (2024-01-12T13:22:03Z) - SHIRO: Soft Hierarchical Reinforcement Learning [0.0]
We present an Off-Policy HRL algorithm that maximizes entropy for efficient exploration.
The algorithm learns a temporally abstracted low-level policy and is able to explore broadly through the addition of entropy to the high-level.
Our method, SHIRO, surpasses state-of-the-art performance on a range of simulated robotic control benchmark tasks.
arXiv Detail & Related papers (2022-12-24T17:21:58Z) - A General Framework for Sample-Efficient Function Approximation in
Reinforcement Learning [132.45959478064736]
We propose a general framework that unifies model-based and model-free reinforcement learning.
We propose a novel estimation function with decomposable structural properties for optimization-based exploration.
Under our framework, a new sample-efficient algorithm namely OPtimization-based ExploRation with Approximation (OPERA) is proposed.
arXiv Detail & Related papers (2022-09-30T17:59:16Z) - Multi-Objective Policy Gradients with Topological Constraints [108.10241442630289]
We present a new algorithm for a policy gradient in TMDPs by a simple extension of the proximal policy optimization (PPO) algorithm.
We demonstrate this on a real-world multiple-objective navigation problem with an arbitrary ordering of objectives both in simulation and on a real robot.
arXiv Detail & Related papers (2022-09-15T07:22:58Z) - Nearly Optimal Latent State Decoding in Block MDPs [74.51224067640717]
In episodic Block MDPs, the decision maker has access to rich observations or contexts generated from a small number of latent states.
We are first interested in estimating the latent state decoding function based on data generated under a fixed behavior policy.
We then study the problem of learning near-optimal policies in the reward-free framework.
arXiv Detail & Related papers (2022-08-17T18:49:53Z) - Action Redundancy in Reinforcement Learning [54.291331971813364]
We show that transition entropy can be described by two terms; namely, model-dependent transition entropy and action redundancy.
Our results suggest that action redundancy is a fundamental problem in reinforcement learning.
arXiv Detail & Related papers (2021-02-22T19:47:26Z) - Generalized Maximum Entropy for Supervised Classification [26.53901315716557]
The maximum entropy principle advocates to evaluate events' probabilities using a distribution that maximizes entropy.
This paper establishes a framework for supervised classification based on the generalized maximum entropy principle.
arXiv Detail & Related papers (2020-07-10T15:41:17Z) - A maximum-entropy approach to off-policy evaluation in average-reward
MDPs [54.967872716145656]
This work focuses on off-policy evaluation (OPE) with function approximation in infinite-horizon undiscounted Markov decision processes (MDPs)
We provide the first finite-sample OPE error bound, extending existing results beyond the episodic and discounted cases.
We show that this results in an exponential-family distribution whose sufficient statistics are the features, paralleling maximum-entropy approaches in supervised learning.
arXiv Detail & Related papers (2020-06-17T18:13:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.