Attention Loss Adjusted Prioritized Experience Replay
- URL: http://arxiv.org/abs/2309.06684v2
- Date: Mon, 9 Oct 2023 03:12:03 GMT
- Title: Attention Loss Adjusted Prioritized Experience Replay
- Authors: Zhuoying Chen, Huiping Li, Rizhong Wang
- Abstract summary: Prioritized Replay Experience (PER) is a technical means of deep reinforcement learning by selecting experience samples with more knowledge quantity to improve the training rate of neural network.
Non-uniform sampling used in PER inevitably shifts the state-action space distribution and brings the estimation error of Q-value function.
An Attention Loss Adjusted Prioritized (ALAP) Experience Replay algorithm is proposed, which integrates the improved Self-Attention network with Double-Sampling mechanism.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Prioritized Experience Replay (PER) is a technical means of deep
reinforcement learning by selecting experience samples with more knowledge
quantity to improve the training rate of neural network. However, the
non-uniform sampling used in PER inevitably shifts the state-action space
distribution and brings the estimation error of Q-value function. In this
paper, an Attention Loss Adjusted Prioritized (ALAP) Experience Replay
algorithm is proposed, which integrates the improved Self-Attention network
with Double-Sampling mechanism to fit the hyperparameter that can regulate the
importance sampling weights to eliminate the estimation error caused by PER. In
order to verify the effectiveness and generality of the algorithm, the ALAP is
tested with value-function based, policy-gradient based and multi-agent
reinforcement learning algorithms in OPENAI gym, and comparison studies verify
the advantage and efficiency of the proposed training framework.
Related papers
- Dissecting Deep RL with High Update Ratios: Combatting Value Divergence [21.282292112642747]
We show that deep reinforcement learning algorithms can retain their ability to learn without resetting network parameters.
We employ a simple unit-ball normalization that enables learning under large update ratios.
arXiv Detail & Related papers (2024-03-09T19:56:40Z) - A Model-Based Approach for Improving Reinforcement Learning Efficiency
Leveraging Expert Observations [9.240917262195046]
We propose an algorithm that automatically adjusts the weights of each component in the augmented loss function.
Experiments on a variety of continuous control tasks demonstrate that the proposed algorithm outperforms various benchmarks.
arXiv Detail & Related papers (2024-02-29T03:53:02Z) - ACE : Off-Policy Actor-Critic with Causality-Aware Entropy Regularization [52.5587113539404]
We introduce a causality-aware entropy term that effectively identifies and prioritizes actions with high potential impacts for efficient exploration.
Our proposed algorithm, ACE: Off-policy Actor-critic with Causality-aware Entropy regularization, demonstrates a substantial performance advantage across 29 diverse continuous control tasks.
arXiv Detail & Related papers (2024-02-22T13:22:06Z) - Directly Attention Loss Adjusted Prioritized Experience Replay [0.07366405857677226]
Prioritized Replay Experience (PER) enables the model to learn more about relatively important samples by artificially changing their accessed frequencies.
DALAP is proposed, which can directly quantify the changed extent of the shifted distribution through Parallel Self-Attention network.
arXiv Detail & Related papers (2023-11-24T10:14:05Z) - Parameter-Efficient Learning for Text-to-Speech Accent Adaptation [58.356667204518985]
This paper presents a parameter-efficient learning (PEL) to develop a low-resource accent adaptation for text-to-speech (TTS)
A resource-efficient adaptation from a frozen pre-trained TTS model is developed by using only 1.2% to 0.8% of original trainable parameters.
Experiment results show that the proposed methods can achieve competitive naturalness with parameter-efficient decoder fine-tuning.
arXiv Detail & Related papers (2023-05-18T22:02:59Z) - Improved Algorithms for Neural Active Learning [74.89097665112621]
We improve the theoretical and empirical performance of neural-network(NN)-based active learning algorithms for the non-parametric streaming setting.
We introduce two regret metrics by minimizing the population loss that are more suitable in active learning than the one used in state-of-the-art (SOTA) related work.
arXiv Detail & Related papers (2022-10-02T05:03:38Z) - Actor Prioritized Experience Replay [0.0]
Prioritized Experience Replay (PER) allows agents to learn from transitions sampled with non-uniform probability proportional to their temporal-difference (TD) error.
We introduce a novel experience replay sampling framework for actor-critic methods, which also regards issues with stability and recent findings behind the poor empirical performance of PER.
An extensive set of experiments verifies our theoretical claims and demonstrates that the introduced method significantly outperforms the competing approaches.
arXiv Detail & Related papers (2022-09-01T15:27:46Z) - CCLF: A Contrastive-Curiosity-Driven Learning Framework for
Sample-Efficient Reinforcement Learning [56.20123080771364]
We develop a model-agnostic Contrastive-Curiosity-Driven Learning Framework (CCLF) for reinforcement learning.
CCLF fully exploit sample importance and improve learning efficiency in a self-supervised manner.
We evaluate this approach on the DeepMind Control Suite, Atari, and MiniGrid benchmarks.
arXiv Detail & Related papers (2022-05-02T14:42:05Z) - Improving Music Performance Assessment with Contrastive Learning [78.8942067357231]
This study investigates contrastive learning as a potential method to improve existing MPA systems.
We introduce a weighted contrastive loss suitable for regression tasks applied to a convolutional neural network.
Our results show that contrastive-based methods are able to match and exceed SoTA performance for MPA regression tasks.
arXiv Detail & Related papers (2021-08-03T19:24:25Z) - Experience Replay with Likelihood-free Importance Weights [123.52005591531194]
We propose to reweight experiences based on their likelihood under the stationary distribution of the current policy.
We apply the proposed approach empirically on two competitive methods, Soft Actor Critic (SAC) and Twin Delayed Deep Deterministic policy gradient (TD3)
arXiv Detail & Related papers (2020-06-23T17:17:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.