Generalized Back-Stepping Experience Replay in Sparse-Reward Environments
- URL: http://arxiv.org/abs/2412.15525v1
- Date: Fri, 20 Dec 2024 03:31:23 GMT
- Title: Generalized Back-Stepping Experience Replay in Sparse-Reward Environments
- Authors: Guwen Lyu, Masahiro Sato,
- Abstract summary: Back-stepping experience replay (BER) is a reinforcement learning technique that can accelerate learning efficiency in reversible environments.<n>The original algorithm is designed for a dense-reward environment that does not require complex exploration.<n>We propose an enhanced version of BER called Generalized BER (GBER), which extends the original algorithm to sparse-reward environments.
- Score: 2.6887381380521878
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Back-stepping experience replay (BER) is a reinforcement learning technique that can accelerate learning efficiency in reversible environments. BER trains an agent with generated back-stepping transitions of collected experiences and normal forward transitions. However, the original algorithm is designed for a dense-reward environment that does not require complex exploration, limiting the BER technique to demonstrate its full potential. Herein, we propose an enhanced version of BER called Generalized BER (GBER), which extends the original algorithm to sparse-reward environments, particularly those with complex structures that require the agent to explore. GBER improves the performance of BER by introducing relabeling mechanism and applying diverse sampling strategies. We evaluate our modified version, which is based on a goal-conditioned deep deterministic policy gradient offline learning algorithm, across various maze navigation environments. The experimental results indicate that the GBER algorithm can significantly boost the performance and stability of the baseline algorithm in various sparse-reward environments, especially those with highly structural symmetricity.
Related papers
- Variance Reduction Based Experience Replay for Policy Optimization [3.7128732378843394]
Variance Reduction Experience Replay (VRER) is a principled framework that selectively reuses informative samples to reduce variance in policy gradient estimation.<n>VRER is algorithm-agnostic and integrates seamlessly with existing policy optimization methods.<n>We show that VRER consistently accelerates policy learning and improves performance over state-of-the-art policy optimization algorithms.
arXiv Detail & Related papers (2026-02-05T06:58:28Z) - Mixture of Ranks with Degradation-Aware Routing for One-Step Real-World Image Super-Resolution [76.66229730098759]
In real-world image super-resolution (Real-ISR), existing approaches mainly rely on fine-tuning pre-trained diffusion models.<n>We propose a Mixture-of-Ranks (MoR) architecture for single-step image super-resolution.<n>We introduce a fine-grained expert partitioning strategy that treats each rank in LoRA as an independent expert.
arXiv Detail & Related papers (2025-11-20T04:11:44Z) - RGE-GS: Reward-Guided Expansive Driving Scene Reconstruction via Diffusion Priors [54.81109375939306]
RGE-GS is a novel expansive reconstruction framework that synergizes diffusion-based generation with reward-guided Gaussian integration.<n>We propose a reward network that learns to identify and prioritize consistently generated patterns prior to reconstruction phases.<n>During the reconstruction process, we devise a differentiated training strategy that automatically adjust Gaussian optimization progress according to scene converge metrics.
arXiv Detail & Related papers (2025-06-28T08:02:54Z) - Embed Progressive Implicit Preference in Unified Space for Deep Collaborative Filtering [13.24227546548424]
Generalized Neural Ordinal Logistic Regression (GNOLR) is proposed to capture the structured progression of user engagement.<n>GNOLR enhances predictive accuracy, captures the progression of user engagement, and simplifies the retrieval process.<n>Experiments on ten real-world datasets show that GNOLR significantly outperforms state-of-the-art methods in efficiency and adaptability.
arXiv Detail & Related papers (2025-05-27T08:43:35Z) - Fast or Better? Balancing Accuracy and Cost in Retrieval-Augmented Generation with Flexible User Control [52.405085773954596]
Retrieval-Augmented Generation (RAG) has emerged as a powerful approach to mitigate large language model hallucinations.
Existing RAG frameworks often apply retrieval indiscriminately,leading to inefficiencies-over-retrieving.
We introduce a novel user-controllable RAG framework that enables dynamic adjustment of the accuracy-cost trade-off.
arXiv Detail & Related papers (2025-02-17T18:56:20Z) - DRED: Zero-Shot Transfer in Reinforcement Learning via Data-Regularised Environment Design [11.922951794283168]
In this work, we investigate how the sampling of individual environment instances, or levels, affects the zero-shot generalisation (ZSG) ability of RL agents.
We discover that for deep actor-critic architectures sharing their base layers, prioritising levels according to their value loss minimises the mutual information between the agent's internal representation and the set of training levels in the generated training data.
We find that existing UED methods can significantly shift the training distribution, which translates to low ZSG performance.
To prevent both overfitting and distributional shift, we introduce data-regularised environment design (D
arXiv Detail & Related papers (2024-02-05T19:47:45Z) - GIFD: A Generative Gradient Inversion Method with Feature Domain
Optimization [52.55628139825667]
Federated Learning (FL) has emerged as a promising distributed machine learning framework to preserve clients' privacy.
Recent studies find that an attacker can invert the shared gradients and recover sensitive data against an FL system by leveraging pre-trained generative adversarial networks (GAN) as prior knowledge.
We propose textbfGradient textbfInversion over textbfFeature textbfDomains (GIFD), which disassembles the GAN model and searches the feature domains of the intermediate layers.
arXiv Detail & Related papers (2023-08-09T04:34:21Z) - Adaptive Experimentation at Scale: A Computational Framework for
Flexible Batches [7.390918770007728]
Motivated by practical instances involving a handful of reallocations in which outcomes are measured in batches, we develop an adaptive-driven experimentation framework.
Our main observation is that normal approximations, which are universal in statistical inference, can also guide the design of adaptive algorithms.
arXiv Detail & Related papers (2023-03-21T04:17:03Z) - CoDEPS: Online Continual Learning for Depth Estimation and Panoptic
Segmentation [28.782231314289174]
We introduce continual learning for deep learning-based monocular depth estimation and panoptic segmentation in new environments in an online manner.
We propose a novel domain-mixing strategy to generate pseudo-labels to adapt panoptic segmentation.
We explicitly address the limited storage capacity of robotic systems by leveraging sampling strategies for constructing a fixed-size replay buffer.
arXiv Detail & Related papers (2023-03-17T17:31:55Z) - Near-optimal Policy Identification in Active Reinforcement Learning [84.27592560211909]
AE-LSVI is a novel variant of the kernelized least-squares value RL (LSVI) algorithm that combines optimism with pessimism for active exploration.
We show that AE-LSVI outperforms other algorithms in a variety of environments when robustness to the initial state is required.
arXiv Detail & Related papers (2022-12-19T14:46:57Z) - Revisiting GANs by Best-Response Constraint: Perspective, Methodology,
and Application [49.66088514485446]
Best-Response Constraint (BRC) is a general learning framework to explicitly formulate the potential dependency of the generator on the discriminator.
We show that even with different motivations and formulations, a variety of existing GANs ALL can be uniformly improved by our flexible BRC methodology.
arXiv Detail & Related papers (2022-05-20T12:42:41Z) - Exploiting Explainable Metrics for Augmented SGD [43.00691899858408]
There are several unanswered questions about how learning under optimization really works and why certain strategies are better than others.
We propose new explainability metrics that measure the redundant information in a network's layers.
We then exploit these metrics to augment the Gradient Descent (SGD) by adaptively adjusting the learning rate in each layer to improve generalization performance.
arXiv Detail & Related papers (2022-03-31T00:16:44Z) - AutoBERT-Zero: Evolving BERT Backbone from Scratch [94.89102524181986]
We propose an Operation-Priority Neural Architecture Search (OP-NAS) algorithm to automatically search for promising hybrid backbone architectures.
We optimize both the search algorithm and evaluation of candidate models to boost the efficiency of our proposed OP-NAS.
Experiments show that the searched architecture (named AutoBERT-Zero) significantly outperforms BERT and its variants of different model capacities in various downstream tasks.
arXiv Detail & Related papers (2021-07-15T16:46:01Z) - Phase Retrieval using Expectation Consistent Signal Recovery Algorithm
based on Hypernetwork [73.94896986868146]
Phase retrieval is an important component in modern computational imaging systems.
Recent advances in deep learning have opened up a new possibility for robust and fast PR.
We develop a novel framework for deep unfolding to overcome the existing limitations.
arXiv Detail & Related papers (2021-01-12T08:36:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.