Related papers: Weighted Entropy Modification for Soft Actor-Critic

Weighted Entropy Modification for Soft Actor-Critic

URL: http://arxiv.org/abs/2011.09083v1
Date: Wed, 18 Nov 2020 04:36:03 GMT
Title: Weighted Entropy Modification for Soft Actor-Critic
Authors: Yizhou Zhao, Song-Chun Zhu
Abstract summary: We generalize the existing principle of the maximum Shannon entropy in reinforcement learning (RL) to weighted entropy by characterizing the state-action pairs with some qualitative weights. We propose an algorithm motivated for self-balancing exploration with the introduced weight function, which leads to state-of-the-art performance on Mujoco tasks despite its simplicity in implementation.
Score: 95.37322316673617
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We generalize the existing principle of the maximum Shannon entropy in reinforcement learning (RL) to weighted entropy by characterizing the state-action pairs with some qualitative weights, which can be connected with prior knowledge, experience replay, and evolution process of the policy. We propose an algorithm motivated for self-balancing exploration with the introduced weight function, which leads to state-of-the-art performance on Mujoco tasks despite its simplicity in implementation.

Related papers

Entropy-guided sequence weighting for efficient exploration in RL-based LLM fine-tuning [0.0]
Entropy-Guided Sequence Weighting (EGSW) is a novel approach that enhances the exploration-exploitation tradeoff. EGSW integrates entropy regularization with advantage-based weighting to balance policy updates.
arXiv Detail & Related papers (2025-03-28T14:07:51Z)
RIZE: Regularized Imitation Learning via Distributional Reinforcement Learning [0.3222802562733786]
We introduce a novel Inverse Reinforcement Learning (IRL) approach that overcomes limitations of fixed reward assignments. We extend the Maximum Entropy IRL framework with a squared temporal-difference (TD) regularizer and adaptive targets, dynamically adjusted during training. Our approach achieves state-of-the-art performance on challenging MuJoCo tasks, demonstrating expert-level results on the Humanoid task with only 3 demonstrations.
arXiv Detail & Related papers (2025-02-27T13:47:29Z)
Maximum Entropy On-Policy Actor-Critic via Entropy Advantage Estimation [0.276240219662896]
A notable form of entropy regularisation is augmenting the objective with an entropy term, thereby simultaneously optimising the expected return and the entropy. This framework, known as maximum entropy reinforcement learning (MaxEnt RL), has shown theoretical and empirical successes. This paper proposes a simple method of separating the entropy objective from the MaxEnt RL objective, which facilitates the implementation of MaxEnt RL in on-policy settings.
arXiv Detail & Related papers (2024-07-25T15:48:24Z)
Maximum Entropy Reinforcement Learning via Energy-Based Normalizing Flow [14.681645502417215]
We introduce a new MaxEnt RL framework modeled using Energy-Based Normalizing Flows (EBFlow) This framework integrates the policy evaluation steps and the policy improvement steps, resulting in a single objective training process. Our method achieves superior performance compared to widely-adopted representative baselines.
arXiv Detail & Related papers (2024-05-22T13:26:26Z)
ACE : Off-Policy Actor-Critic with Causality-Aware Entropy Regularization [52.5587113539404]
We introduce a causality-aware entropy term that effectively identifies and prioritizes actions with high potential impacts for efficient exploration. Our proposed algorithm, ACE: Off-policy Actor-critic with Causality-aware Entropy regularization, demonstrates a substantial performance advantage across 29 diverse continuous control tasks.
arXiv Detail & Related papers (2024-02-22T13:22:06Z)
Entropy-Regularized Token-Level Policy Optimization for Language Agent Reinforcement [67.1393112206885]
Large Language Models (LLMs) have shown promise as intelligent agents in interactive decision-making tasks. We introduce Entropy-Regularized Token-level Policy Optimization (ETPO), an entropy-augmented RL method tailored for optimizing LLMs at the token level. We assess the effectiveness of ETPO within a simulated environment that models data science code generation as a series of multi-step interactive tasks.
arXiv Detail & Related papers (2024-02-09T07:45:26Z)
Let's reward step by step: Step-Level reward model as the Navigators for Reasoning [64.27898739929734]
Process-Supervised Reward Model (PRM) furnishes LLMs with step-by-step feedback during the training phase. We propose a greedy search algorithm that employs the step-level feedback from PRM to optimize the reasoning pathways explored by LLMs. To explore the versatility of our approach, we develop a novel method to automatically generate step-level reward dataset for coding tasks and observed similar improved performance in the code generation tasks.
arXiv Detail & Related papers (2023-10-16T05:21:50Z)
Latent Variable Representation for Reinforcement Learning [131.03944557979725]
It remains unclear theoretically and empirically how latent variable models may facilitate learning, planning, and exploration to improve the sample efficiency of model-based reinforcement learning. We provide a representation view of the latent variable models for state-action value functions, which allows both tractable variational learning algorithm and effective implementation of the optimism/pessimism principle. In particular, we propose a computationally efficient planning algorithm with UCB exploration by incorporating kernel embeddings of latent variable models.
arXiv Detail & Related papers (2022-12-17T00:26:31Z)
Utilizing Prior Solutions for Reward Shaping and Composition in Entropy-Regularized Reinforcement Learning [3.058685580689605]
We develop a general framework for reward shaping and task composition in entropy-regularized RL. We show how the derived relation leads to a general result for reward shaping in entropy-regularized RL. We then generalize this approach to derive an exact relation connecting optimal value functions for the composition of multiple tasks in entropy-regularized RL.
arXiv Detail & Related papers (2022-12-02T13:57:53Z)
Guaranteed Conservation of Momentum for Learning Particle-based Fluid Dynamics [96.9177297872723]
We present a novel method for guaranteeing linear momentum in learned physics simulations. We enforce conservation of momentum with a hard constraint, which we realize via antisymmetrical continuous convolutional layers. In combination, the proposed method allows us to increase the physical accuracy of the learned simulator substantially.
arXiv Detail & Related papers (2022-10-12T09:12:59Z)
Weighted Maximum Entropy Inverse Reinforcement Learning [22.269565708490468]
We study inverse reinforcement learning (IRL) and imitation learning (IM) We propose a new way to improve the learning process by adding the maximum weight function to the entropy framework. Our framework and algorithms allow to learn both a reward (or policy) function and the structure of the entropy terms added to the Markov Decision Processes.
arXiv Detail & Related papers (2022-08-20T06:02:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.