Related papers: Efficient Diversity-based Experience Replay for Deep Reinforcement Learning

Efficient Diversity-based Experience Replay for Deep Reinforcement Learning

URL: http://arxiv.org/abs/2410.20487v3
Date: Thu, 23 Jan 2025 07:39:58 GMT
Title: Efficient Diversity-based Experience Replay for Deep Reinforcement Learning
Authors: Kaiyan Zhao, Yiming Wang, Yuyang Chen, Yan Li, Leong Hou U, Xiaoguang Niu,
Abstract summary: We propose a novel approach, Efficient Diversity-based Experience Replay (EDER)<n>EDER employs a deterministic point process to model the diversity between samples and prioritizes replay based on the diversity between samples.<n>Experiments are conducted on robotic manipulation tasks in MuJoCo, Atari games, and realistic indoor environments in Habitat.
Score: 14.96744975805832
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Experience replay is widely used to improve learning efficiency in reinforcement learning by leveraging past experiences. However, existing experience replay methods, whether based on uniform or prioritized sampling, often suffer from low efficiency, particularly in real-world scenarios with high-dimensional state spaces. To address this limitation, we propose a novel approach, Efficient Diversity-based Experience Replay (EDER). EDER employs a deterministic point process to model the diversity between samples and prioritizes replay based on the diversity between samples. To further enhance learning efficiency, we incorporate Cholesky decomposition for handling large state spaces in realistic environments. Additionally, rejection sampling is applied to select samples with higher diversity, thereby improving overall learning efficacy. Extensive experiments are conducted on robotic manipulation tasks in MuJoCo, Atari games, and realistic indoor environments in Habitat. The results demonstrate that our approach not only significantly improves learning efficiency but also achieves superior performance in high-dimensional, realistic environments.

Related papers

Adaptive teachers for amortized samplers [76.88721198565861]
Amortized inference is the task of training a parametric model, such as a neural network, to approximate a distribution with a given unnormalized density where exact sampling is intractable. Off-policy RL training facilitates the discovery of diverse, high-reward candidates, but existing methods still face challenges in efficient exploration. We propose an adaptive training distribution (the Teacher) to guide the training of the primary amortized sampler (the Student) by prioritizing high-loss regions.
arXiv Detail & Related papers (2024-10-02T11:33:13Z)
Random Latent Exploration for Deep Reinforcement Learning [71.88709402926415]
This paper introduces a new exploration technique called Random Latent Exploration (RLE) RLE combines the strengths of bonus-based and noise-based (two popular approaches for effective exploration in deep RL) exploration strategies. We evaluate it on the challenging Atari and IsaacGym benchmarks and show that RLE exhibits higher overall scores across all the tasks than other approaches.
arXiv Detail & Related papers (2024-07-18T17:55:22Z)
Iterative Experience Refinement of Software-Developing Agents [81.09737243969758]
Large language models (LLMs) can leverage past experiences to reduce errors and enhance efficiency. This paper introduces the Iterative Experience Refinement framework, enabling LLM agents to refine experiences iteratively during task execution.
arXiv Detail & Related papers (2024-05-07T11:33:49Z)
SVDE: Scalable Value-Decomposition Exploration for Cooperative Multi-Agent Reinforcement Learning [22.389803019100423]
We propose a scalable value-decomposition exploration (SVDE) method, which includes a scalable training mechanism, intrinsic reward design, and explorative experience replay. Our method achieves the best performance on almost all maps compared to other popular algorithms in a set of StarCraft II micromanagement games.
arXiv Detail & Related papers (2023-03-16T03:17:20Z)
Towards Improving Exploration in Self-Imitation Learning using Intrinsic Motivation [7.489793155793319]
Reinforcement Learning has emerged as a strong alternative to solve optimization tasks efficiently. The use of these algorithms highly depends on the feedback signals provided by the environment in charge of informing about how good (or bad) the decisions made by the learned agent are. In this work intrinsic motivation is used to encourage the agent to explore the environment based on its curiosity, whereas imitation learning allows repeating the most promising experiences to accelerate the learning process.
arXiv Detail & Related papers (2022-11-30T09:18:59Z)
Environment Design for Inverse Reinforcement Learning [3.085995273374333]
Current inverse reinforcement learning methods that focus on learning from a single environment can fail to handle slight changes in the environment dynamics. In our framework, the learner repeatedly interacts with the expert, with the former selecting environments to identify the reward function. This results in improvements in both sample-efficiency and robustness, as we show experimentally, for both exact and approximate inference.
arXiv Detail & Related papers (2022-10-26T18:31:17Z)
Cluster-based Sampling in Hindsight Experience Replay for Robotic Tasks (Student Abstract) [3.4616343332323596]
This paper investigates the impact of exploiting the property of achieved goals in generating successful experiences. The proposed sampling strategy groups episodes with different achieved goals by using a cluster model and samples experiences in the manner of HER. The results of experiments demonstrate that the proposed method is substantially sample efficient and achieves better performance than baseline approaches.
arXiv Detail & Related papers (2022-08-31T09:45:30Z)
Basis for Intentions: Efficient Inverse Reinforcement Learning using Past Experience [89.30876995059168]
inverse reinforcement learning (IRL) -- inferring the reward function of an agent from observing its behavior. This paper addresses the problem of IRL -- inferring the reward function of an agent from observing its behavior.
arXiv Detail & Related papers (2022-08-09T17:29:49Z)
Learning Dense Reward with Temporal Variant Self-Supervision [5.131840233837565]
Complex real-world robotic applications lack explicit and informative descriptions that can directly be used as rewards. Previous effort has shown that it is possible to algorithmically extract dense rewards directly from multimodal observations. This paper proposes a more efficient and robust way of sampling and learning.
arXiv Detail & Related papers (2022-05-20T20:30:57Z)
SURF: Semi-supervised Reward Learning with Data Augmentation for Feedback-efficient Preference-based Reinforcement Learning [168.89470249446023]
We present SURF, a semi-supervised reward learning framework that utilizes a large amount of unlabeled samples with data augmentation. In order to leverage unlabeled samples for reward learning, we infer pseudo-labels of the unlabeled samples based on the confidence of the preference predictor. Our experiments demonstrate that our approach significantly improves the feedback-efficiency of the preference-based method on a variety of locomotion and robotic manipulation tasks.
arXiv Detail & Related papers (2022-03-18T16:50:38Z)
Sequential Bayesian experimental designs via reinforcement learning [0.0]
We provide a new approach Sequential Experimental Design via Reinforcement Learning to construct BED in a sequential manner. By proposing a new real-world-oriented experimental environment, our approach aims to maximize the expected information gain. It is confirmed that our method outperforms the existing methods in various indices such as the EIG and sampling efficiency.
arXiv Detail & Related papers (2022-02-14T04:29:04Z)
TRAIL: Near-Optimal Imitation Learning with Suboptimal Data [100.83688818427915]
We present training objectives that use offline datasets to learn a factored transition model. Our theoretical analysis shows that the learned latent action space can boost the sample-efficiency of downstream imitation learning. To learn the latent action space in practice, we propose TRAIL (Transition-Reparametrized Actions for Imitation Learning), an algorithm that learns an energy-based transition model.
arXiv Detail & Related papers (2021-10-27T21:05:00Z)
MHER: Model-based Hindsight Experience Replay [33.00149668905828]
We propose Model-based Hindsight Experience Replay (MHER) to solve multi-goal reinforcement learning problems. replacing original goals with virtual goals generated from interaction with a trained dynamics model leads to a novel relabeling method. MHER exploits experiences more efficiently by leveraging environmental dynamics to generate virtual achieved goals.
arXiv Detail & Related papers (2021-07-01T08:52:45Z)
Learning to Sample with Local and Global Contexts in Experience Replay Buffer [135.94190624087355]
We propose a new learning-based sampling method that can compute the relative importance of transition. We show that our framework can significantly improve the performance of various off-policy reinforcement learning methods.
arXiv Detail & Related papers (2020-07-14T21:12:56Z)
Learning Sparse Rewarded Tasks from Sub-Optimal Demonstrations [78.94386823185724]
Imitation learning learns effectively in sparse-rewarded tasks by leveraging the existing expert demonstrations. In practice, collecting a sufficient amount of expert demonstrations can be prohibitively expensive. We propose Self-Adaptive Learning (SAIL) that can achieve (near) optimal performance given only a limited number of sub-optimal demonstrations.
arXiv Detail & Related papers (2020-04-01T15:57:15Z)
Soft Hindsight Experience Replay [77.99182201815763]
Soft Hindsight Experience Replay (SHER) is a novel approach based on HER and Maximum Entropy Reinforcement Learning (MERL) We evaluate SHER on Open AI Robotic manipulation tasks with sparse rewards.
arXiv Detail & Related papers (2020-02-06T03:57:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.