Weighted Entropy Modification for Soft Actor-Critic
- URL: http://arxiv.org/abs/2011.09083v1
- Date: Wed, 18 Nov 2020 04:36:03 GMT
- Title: Weighted Entropy Modification for Soft Actor-Critic
- Authors: Yizhou Zhao, Song-Chun Zhu
- Abstract summary: We generalize the existing principle of the maximum Shannon entropy in reinforcement learning (RL) to weighted entropy by characterizing the state-action pairs with some qualitative weights.
We propose an algorithm motivated for self-balancing exploration with the introduced weight function, which leads to state-of-the-art performance on Mujoco tasks despite its simplicity in implementation.
- Score: 95.37322316673617
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We generalize the existing principle of the maximum Shannon entropy in
reinforcement learning (RL) to weighted entropy by characterizing the
state-action pairs with some qualitative weights, which can be connected with
prior knowledge, experience replay, and evolution process of the policy. We
propose an algorithm motivated for self-balancing exploration with the
introduced weight function, which leads to state-of-the-art performance on
Mujoco tasks despite its simplicity in implementation.
Related papers
- Maximum Entropy On-Policy Actor-Critic via Entropy Advantage Estimation [0.276240219662896]
A notable form of entropy regularisation is augmenting the objective with an entropy term, thereby simultaneously optimising the expected return and the entropy.
This framework, known as maximum entropy reinforcement learning (MaxEnt RL), has shown theoretical and empirical successes.
This paper proposes a simple method of separating the entropy objective from the MaxEnt RL objective, which facilitates the implementation of MaxEnt RL in on-policy settings.
arXiv Detail & Related papers (2024-07-25T15:48:24Z) - Maximum Entropy Reinforcement Learning via Energy-Based Normalizing Flow [14.681645502417215]
We introduce a new MaxEnt RL framework modeled using Energy-Based Normalizing Flows (EBFlow)
This framework integrates the policy evaluation steps and the policy improvement steps, resulting in a single objective training process.
Our method achieves superior performance compared to widely-adopted representative baselines.
arXiv Detail & Related papers (2024-05-22T13:26:26Z) - ACE : Off-Policy Actor-Critic with Causality-Aware Entropy Regularization [52.5587113539404]
We introduce a causality-aware entropy term that effectively identifies and prioritizes actions with high potential impacts for efficient exploration.
Our proposed algorithm, ACE: Off-policy Actor-critic with Causality-aware Entropy regularization, demonstrates a substantial performance advantage across 29 diverse continuous control tasks.
arXiv Detail & Related papers (2024-02-22T13:22:06Z) - Entropy-Regularized Token-Level Policy Optimization for Language Agent Reinforcement [67.1393112206885]
Large Language Models (LLMs) have shown promise as intelligent agents in interactive decision-making tasks.
We introduce Entropy-Regularized Token-level Policy Optimization (ETPO), an entropy-augmented RL method tailored for optimizing LLMs at the token level.
We assess the effectiveness of ETPO within a simulated environment that models data science code generation as a series of multi-step interactive tasks.
arXiv Detail & Related papers (2024-02-09T07:45:26Z) - Let's reward step by step: Step-Level reward model as the Navigators for
Reasoning [64.27898739929734]
Process-Supervised Reward Model (PRM) furnishes LLMs with step-by-step feedback during the training phase.
We propose a greedy search algorithm that employs the step-level feedback from PRM to optimize the reasoning pathways explored by LLMs.
To explore the versatility of our approach, we develop a novel method to automatically generate step-level reward dataset for coding tasks and observed similar improved performance in the code generation tasks.
arXiv Detail & Related papers (2023-10-16T05:21:50Z) - Latent Variable Representation for Reinforcement Learning [131.03944557979725]
It remains unclear theoretically and empirically how latent variable models may facilitate learning, planning, and exploration to improve the sample efficiency of model-based reinforcement learning.
We provide a representation view of the latent variable models for state-action value functions, which allows both tractable variational learning algorithm and effective implementation of the optimism/pessimism principle.
In particular, we propose a computationally efficient planning algorithm with UCB exploration by incorporating kernel embeddings of latent variable models.
arXiv Detail & Related papers (2022-12-17T00:26:31Z) - Utilizing Prior Solutions for Reward Shaping and Composition in
Entropy-Regularized Reinforcement Learning [3.058685580689605]
We develop a general framework for reward shaping and task composition in entropy-regularized RL.
We show how the derived relation leads to a general result for reward shaping in entropy-regularized RL.
We then generalize this approach to derive an exact relation connecting optimal value functions for the composition of multiple tasks in entropy-regularized RL.
arXiv Detail & Related papers (2022-12-02T13:57:53Z) - Guaranteed Conservation of Momentum for Learning Particle-based Fluid
Dynamics [96.9177297872723]
We present a novel method for guaranteeing linear momentum in learned physics simulations.
We enforce conservation of momentum with a hard constraint, which we realize via antisymmetrical continuous convolutional layers.
In combination, the proposed method allows us to increase the physical accuracy of the learned simulator substantially.
arXiv Detail & Related papers (2022-10-12T09:12:59Z) - Weighted Maximum Entropy Inverse Reinforcement Learning [22.269565708490468]
We study inverse reinforcement learning (IRL) and imitation learning (IM)
We propose a new way to improve the learning process by adding the maximum weight function to the entropy framework.
Our framework and algorithms allow to learn both a reward (or policy) function and the structure of the entropy terms added to the Markov Decision Processes.
arXiv Detail & Related papers (2022-08-20T06:02:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.