Related papers: Efficient Diversity-based Experience Replay for Deep Reinforcement Learning

Efficient Diversity-based Experience Replay for Deep Reinforcement Learning

URL: http://arxiv.org/abs/2410.20487v1
Date: Sun, 27 Oct 2024 15:51:27 GMT
Title: Efficient Diversity-based Experience Replay for Deep Reinforcement Learning
Authors: Kaiyan Zhao, Yiming Wang, Yuyang Chen, Xiaoguang Niu, Yan Li, Leong Hou U,
Abstract summary: This paper proposes a novel approach, diversity-based experience replay (DBER), which leverages the deterministic point process to prioritize diverse samples in state realizations. We conducted extensive experiments on Robotic Manipulation tasks in MuJoCo, Atari games, and realistic in-door environments in Habitat.
Score: 14.96744975805832
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep Reinforcement Learning (DRL) has achieved remarkable success in solving complex decision-making problems by combining the representation capabilities of deep learning with the decision-making power of reinforcement learning. However, learning in sparse reward environments remains challenging due to insufficient feedback to guide the optimization of agents, especially in real-life environments with high-dimensional states. To tackle this issue, experience replay is commonly introduced to enhance learning efficiency through past experiences. Nonetheless, current methods of experience replay, whether based on uniform or prioritized sampling, frequently struggle with suboptimal learning efficiency and insufficient utilization of samples. This paper proposes a novel approach, diversity-based experience replay (DBER), which leverages the deterministic point process to prioritize diverse samples in state realizations. We conducted extensive experiments on Robotic Manipulation tasks in MuJoCo, Atari games, and realistic in-door environments in Habitat. The results show that our method not only significantly improves learning efficiency but also demonstrates superior performance in sparse reward environments with high-dimensional states, providing a simple yet effective solution for this field.

Related papers

Adaptive teachers for amortized samplers [76.88721198565861]
Amortized inference is the task of training a parametric model, such as a neural network, to approximate a distribution with a given unnormalized density where exact sampling is intractable. Off-policy RL training facilitates the discovery of diverse, high-reward candidates, but existing methods still face challenges in efficient exploration. We propose an adaptive training distribution (the Teacher) to guide the training of the primary amortized sampler (the Student) by prioritizing high-loss regions.
arXiv Detail & Related papers (2024-10-02T11:33:13Z)
Random Latent Exploration for Deep Reinforcement Learning [71.88709402926415]
This paper introduces a new exploration technique called Random Latent Exploration (RLE) RLE combines the strengths of bonus-based and noise-based (two popular approaches for effective exploration in deep RL) exploration strategies. We evaluate it on the challenging Atari and IsaacGym benchmarks and show that RLE exhibits higher overall scores across all the tasks than other approaches.
arXiv Detail & Related papers (2024-07-18T17:55:22Z)
Iterative Experience Refinement of Software-Developing Agents [81.09737243969758]
Large language models (LLMs) can leverage past experiences to reduce errors and enhance efficiency. This paper introduces the Iterative Experience Refinement framework, enabling LLM agents to refine experiences iteratively during task execution.
arXiv Detail & Related papers (2024-05-07T11:33:49Z)
SVDE: Scalable Value-Decomposition Exploration for Cooperative Multi-Agent Reinforcement Learning [22.389803019100423]
We propose a scalable value-decomposition exploration (SVDE) method, which includes a scalable training mechanism, intrinsic reward design, and explorative experience replay. Our method achieves the best performance on almost all maps compared to other popular algorithms in a set of StarCraft II micromanagement games.
arXiv Detail & Related papers (2023-03-16T03:17:20Z)
Towards Improving Exploration in Self-Imitation Learning using Intrinsic Motivation [7.489793155793319]
Reinforcement Learning has emerged as a strong alternative to solve optimization tasks efficiently. The use of these algorithms highly depends on the feedback signals provided by the environment in charge of informing about how good (or bad) the decisions made by the learned agent are. In this work intrinsic motivation is used to encourage the agent to explore the environment based on its curiosity, whereas imitation learning allows repeating the most promising experiences to accelerate the learning process.
arXiv Detail & Related papers (2022-11-30T09:18:59Z)
Environment Design for Inverse Reinforcement Learning [3.085995273374333]
Current inverse reinforcement learning methods that focus on learning from a single environment can fail to handle slight changes in the environment dynamics. In our framework, the learner repeatedly interacts with the expert, with the former selecting environments to identify the reward function. This results in improvements in both sample-efficiency and robustness, as we show experimentally, for both exact and approximate inference.
arXiv Detail & Related papers (2022-10-26T18:31:17Z)
Cluster-based Sampling in Hindsight Experience Replay for Robotic Tasks (Student Abstract) [3.4616343332323596]
This paper investigates the impact of exploiting the property of achieved goals in generating successful experiences. The proposed sampling strategy groups episodes with different achieved goals by using a cluster model and samples experiences in the manner of HER. The results of experiments demonstrate that the proposed method is substantially sample efficient and achieves better performance than baseline approaches.
arXiv Detail & Related papers (2022-08-31T09:45:30Z)
Basis for Intentions: Efficient Inverse Reinforcement Learning using Past Experience [89.30876995059168]
inverse reinforcement learning (IRL) -- inferring the reward function of an agent from observing its behavior. This paper addresses the problem of IRL -- inferring the reward function of an agent from observing its behavior.
arXiv Detail & Related papers (2022-08-09T17:29:49Z)
Learning Dense Reward with Temporal Variant Self-Supervision [5.131840233837565]
Complex real-world robotic applications lack explicit and informative descriptions that can directly be used as rewards. Previous effort has shown that it is possible to algorithmically extract dense rewards directly from multimodal observations. This paper proposes a more efficient and robust way of sampling and learning.
arXiv Detail & Related papers (2022-05-20T20:30:57Z)
Sequential Bayesian experimental designs via reinforcement learning [0.0]
We provide a new approach Sequential Experimental Design via Reinforcement Learning to construct BED in a sequential manner. By proposing a new real-world-oriented experimental environment, our approach aims to maximize the expected information gain. It is confirmed that our method outperforms the existing methods in various indices such as the EIG and sampling efficiency.
arXiv Detail & Related papers (2022-02-14T04:29:04Z)
TRAIL: Near-Optimal Imitation Learning with Suboptimal Data [100.83688818427915]
We present training objectives that use offline datasets to learn a factored transition model. Our theoretical analysis shows that the learned latent action space can boost the sample-efficiency of downstream imitation learning. To learn the latent action space in practice, we propose TRAIL (Transition-Reparametrized Actions for Imitation Learning), an algorithm that learns an energy-based transition model.
arXiv Detail & Related papers (2021-10-27T21:05:00Z)
MHER: Model-based Hindsight Experience Replay [33.00149668905828]
We propose Model-based Hindsight Experience Replay (MHER) to solve multi-goal reinforcement learning problems. replacing original goals with virtual goals generated from interaction with a trained dynamics model leads to a novel relabeling method. MHER exploits experiences more efficiently by leveraging environmental dynamics to generate virtual achieved goals.
arXiv Detail & Related papers (2021-07-01T08:52:45Z)
Learning to Sample with Local and Global Contexts in Experience Replay Buffer [135.94190624087355]
We propose a new learning-based sampling method that can compute the relative importance of transition. We show that our framework can significantly improve the performance of various off-policy reinforcement learning methods.
arXiv Detail & Related papers (2020-07-14T21:12:56Z)
Learning Sparse Rewarded Tasks from Sub-Optimal Demonstrations [78.94386823185724]
Imitation learning learns effectively in sparse-rewarded tasks by leveraging the existing expert demonstrations. In practice, collecting a sufficient amount of expert demonstrations can be prohibitively expensive. We propose Self-Adaptive Learning (SAIL) that can achieve (near) optimal performance given only a limited number of sub-optimal demonstrations.
arXiv Detail & Related papers (2020-04-01T15:57:15Z)
Soft Hindsight Experience Replay [77.99182201815763]
Soft Hindsight Experience Replay (SHER) is a novel approach based on HER and Maximum Entropy Reinforcement Learning (MERL) We evaluate SHER on Open AI Robotic manipulation tasks with sparse rewards.
arXiv Detail & Related papers (2020-02-06T03:57:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.