Related papers: ACE : Off-Policy Actor-Critic with Causality-Aware Entropy Regularization

ACE : Off-Policy Actor-Critic with Causality-Aware Entropy Regularization

URL: http://arxiv.org/abs/2402.14528v5
Date: Mon, 04 Nov 2024 05:18:20 GMT
Title: ACE : Off-Policy Actor-Critic with Causality-Aware Entropy Regularization
Authors: Tianying Ji, Yongyuan Liang, Yan Zeng, Yu Luo, Guowei Xu, Jiawei Guo, Ruijie Zheng, Furong Huang, Fuchun Sun, Huazhe Xu,
Abstract summary: We introduce a causality-aware entropy term that effectively identifies and prioritizes actions with high potential impacts for efficient exploration. Our proposed algorithm, ACE: Off-policy Actor-critic with Causality-aware Entropy regularization, demonstrates a substantial performance advantage across 29 diverse continuous control tasks.
Score: 52.5587113539404
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: The varying significance of distinct primitive behaviors during the policy learning process has been overlooked by prior model-free RL algorithms. Leveraging this insight, we explore the causal relationship between different action dimensions and rewards to evaluate the significance of various primitive behaviors during training. We introduce a causality-aware entropy term that effectively identifies and prioritizes actions with high potential impacts for efficient exploration. Furthermore, to prevent excessive focus on specific primitive behaviors, we analyze the gradient dormancy phenomenon and introduce a dormancy-guided reset mechanism to further enhance the efficacy of our method. Our proposed algorithm, ACE: Off-policy Actor-critic with Causality-aware Entropy regularization, demonstrates a substantial performance advantage across 29 diverse continuous control tasks spanning 7 domains compared to model-free RL baselines, which underscores the effectiveness, versatility, and efficient sample efficiency of our approach. Benchmark results and videos are available at https://ace-rl.github.io/.

Related papers

AEGPO: Adaptive Entropy-Guided Policy Optimization for Diffusion Models [54.56296715999545]
Reinforcement learning from human feedback shows promise for aligning diffusion and flow models.<n>Policy optimization methods such as GRPO suffer from inefficient and static sampling strategies.<n>We propose Adaptive Entropy-Guided Policy Optimization (AEGPO), a novel dual-signal, dual-level adaptive optimization strategy.
arXiv Detail & Related papers (2026-02-06T16:09:50Z)
Efficient Test-time Adaptive Object Detection via Sensitivity-Guided Pruning [73.40364018029673]
Continual test-time adaptive object detection (CTTA-OD) aims to online adapt a source pre-trained detector to ever-changing environments.<n>Our motivation stems from the observation that not all learned source features are beneficial.<n>Our method achieves superior adaptation performance while reducing computational overhead by 12% in FLOPs.
arXiv Detail & Related papers (2025-06-03T05:27:56Z)
Reinforcement Learning under Latent Dynamics: Toward Statistical and Algorithmic Modularity [51.40558987254471]
Real-world applications of reinforcement learning often involve environments where agents operate on complex, high-dimensional observations. This paper addresses the question of reinforcement learning under $textitgeneral$ latent dynamics from a statistical and algorithmic perspective.
arXiv Detail & Related papers (2024-10-23T14:22:49Z)
Long-Sequence Recommendation Models Need Decoupled Embeddings [49.410906935283585]
We identify and characterize a neglected deficiency in existing long-sequence recommendation models. A single set of embeddings struggles with learning both attention and representation, leading to interference between these two processes. We propose the Decoupled Attention and Representation Embeddings (DARE) model, where two distinct embedding tables are learned separately to fully decouple attention and representation.
arXiv Detail & Related papers (2024-10-03T15:45:15Z)
Is Value Functions Estimation with Classification Plug-and-play for Offline Reinforcement Learning? [1.9116784879310031]
In deep Reinforcement Learning (RL), value functions are approximated using deep neural networks and trained via mean squared error regression objectives. Recent research has proposed an alternative approach, utilizing the cross-entropy classification objective. Our work seeks to empirically investigate the impact of such a replacement in an offline RL setup.
arXiv Detail & Related papers (2024-06-10T14:25:11Z)
Counterfactual Learning with Multioutput Deep Kernels [0.0]
In this paper, we address the challenge of performing counterfactual inference with observational data. We present a general class of counterfactual multi-task deep kernels models that estimate causal effects and learn policies proficiently.
arXiv Detail & Related papers (2022-11-20T23:28:41Z)
Domain Adaptation with Adversarial Training on Penultimate Activations [82.9977759320565]
Enhancing model prediction confidence on unlabeled target data is an important objective in Unsupervised Domain Adaptation (UDA) We show that this strategy is more efficient and better correlated with the objective of boosting prediction confidence than adversarial training on input images or intermediate features.
arXiv Detail & Related papers (2022-08-26T19:50:46Z)
Offline Policy Optimization with Eligible Actions [34.4530766779594]
offline policy optimization could have a large impact on many real-world decision-making problems. Importance sampling and its variants are a commonly used type of estimator in offline policy evaluation. We propose an algorithm to avoid this overfitting through a new per-state-neighborhood normalization constraint.
arXiv Detail & Related papers (2022-07-01T19:18:15Z)
CIC: Contrastive Intrinsic Control for Unsupervised Skill Discovery [88.97076030698433]
We introduce Contrastive Intrinsic Control (CIC), an algorithm for unsupervised skill discovery. CIC explicitly incentivizes diverse behaviors by maximizing state entropy. We find that CIC substantially improves over prior unsupervised skill discovery methods.
arXiv Detail & Related papers (2022-02-01T00:36:29Z)
Transfer RL across Observation Feature Spaces via Model-Based Regularization [9.660642248872973]
In many reinforcement learning (RL) applications, the observation space is specified by human developers and restricted by physical realizations. We propose a novel algorithm which extracts the latent-space dynamics in the source task, and transfers the dynamics model to the target task. Our algorithm works for drastic changes of observation space without any inter-task mapping or any prior knowledge of the target task.
arXiv Detail & Related papers (2022-01-01T22:41:19Z)
APS: Active Pretraining with Successor Features [96.24533716878055]
We show that by reinterpreting and combining successorcitepHansenFast with non entropy, the intractable mutual information can be efficiently optimized. The proposed method Active Pretraining with Successor Feature (APS) explores the environment via non entropy, and the explored data can be efficiently leveraged to learn behavior.
arXiv Detail & Related papers (2021-08-31T16:30:35Z)
Discrete Action On-Policy Learning with Action-Value Critic [72.20609919995086]
Reinforcement learning (RL) in discrete action space is ubiquitous in real-world applications, but its complexity grows exponentially with the action-space dimension. We construct a critic to estimate action-value functions, apply it on correlated actions, and combine these critic estimated action values to control the variance of gradient estimation. These efforts result in a new discrete action on-policy RL algorithm that empirically outperforms related on-policy algorithms relying on variance control techniques.
arXiv Detail & Related papers (2020-02-10T04:23:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.