RAD: Retrieval High-quality Demonstrations to Enhance Decision-making
- URL: http://arxiv.org/abs/2507.15356v1
- Date: Mon, 21 Jul 2025 08:08:18 GMT
- Title: RAD: Retrieval High-quality Demonstrations to Enhance Decision-making
- Authors: Lu Guo, Yixiang Shan, Zhengbang Zhu, Qifan Liang, Lichang Song, Ting Long, Weinan Zhang, Yi Chang,
- Abstract summary: offline reinforcement learning (RL) enables agents to learn policies from fixed datasets.<n>RL is often limited by dataset sparsity and the lack of transition overlap between suboptimal and expert trajectories.<n>We propose Retrieval High-quAlity Demonstrations (RAD) for decision-making, which combines non-parametric retrieval with diffusion-based generative modeling.
- Score: 23.136426643341462
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Offline reinforcement learning (RL) enables agents to learn policies from fixed datasets, avoiding costly or unsafe environment interactions. However, its effectiveness is often limited by dataset sparsity and the lack of transition overlap between suboptimal and expert trajectories, which makes long-horizon planning particularly challenging. Prior solutions based on synthetic data augmentation or trajectory stitching often fail to generalize to novel states and rely on heuristic stitching points. To address these challenges, we propose Retrieval High-quAlity Demonstrations (RAD) for decision-making, which combines non-parametric retrieval with diffusion-based generative modeling. RAD dynamically retrieves high-return states from the offline dataset as target states based on state similarity and return estimation, and plans toward them using a condition-guided diffusion model. Such retrieval-guided generation enables flexible trajectory stitching and improves generalization when encountered with underrepresented or out-of-distribution states. Extensive experiments confirm that RAD achieves competitive or superior performance compared to baselines across diverse benchmarks, validating its effectiveness.
Related papers
- TopoCurate:Modeling Interaction Topology for Tool-Use Agent Training [53.93696896939915]
Training tool-use agents typically rely on Supervised Fine-Tuning (SFT) on successful trajectories and Reinforcement Learning (RL) on pass-rate-selected tasks.<n>We propose TopoCurate, an interaction-aware framework that projects multi-trial rollouts from the same task into a unified semantic quotient topology.<n>TopoCurate achieves consistent gains of 4.2% (SFT) and 6.9% (RL) over state-of-the-art baselines.
arXiv Detail & Related papers (2026-03-02T10:38:54Z) - Consolidation or Adaptation? PRISM: Disentangling SFT and RL Data via Gradient Concentration [56.074760766965085]
PRISM achieves a dynamics-aware framework that arbitrates data based on its degree of cognitive conflict with the model's existing knowledge.<n>Our findings suggest that disentangling data based on internal optimization regimes is crucial for scalable and robust agent alignment.
arXiv Detail & Related papers (2026-01-12T05:43:20Z) - A Comparison Between Decision Transformers and Traditional Offline Reinforcement Learning Algorithms [0.0]
This paper presents a study evaluating the performance of DT against traditional offline RL algorithms in dense and sparse reward settings.<n>We found that DTs showed less sensitivity to varying reward density compared to other methods.<n>Traditional value-based methods like IQL showed improved performance in dense reward settings with high-quality data, while CQL offered balanced performance across different data qualities.
arXiv Detail & Related papers (2025-11-20T15:44:11Z) - Improving Deepfake Detection with Reinforcement Learning-Based Adaptive Data Augmentation [60.04281435591454]
CRDA (Curriculum Reinforcement-Learning Data Augmentation) is a novel framework guiding detectors to progressively master multi-domain forgery features.<n>Central to our approach is integrating reinforcement learning and causal inference.<n>Our method significantly improves detector generalizability, outperforming SOTA methods across multiple cross-domain datasets.
arXiv Detail & Related papers (2025-11-10T12:45:52Z) - Structural Information-based Hierarchical Diffusion for Offline Reinforcement Learning [13.839214658191038]
We propose Structural Information-based Hierarchical Diffusion framework for effective and stable offline policy learning.<n>We analyze structural information embedded in offline trajectories to construct the diffusion hierarchy adaptively.<n>We show that SIHD significantly outperforms state-of-the-art baselines in decision-making performance.
arXiv Detail & Related papers (2025-09-26T06:24:06Z) - RealDrive: Retrieval-Augmented Driving with Diffusion Models [42.6467760755688]
Learning-based planners generate human-like driving behaviors by learning to reason about nuanced interactions from data.<n>Data-driven approaches often struggle with rare, safety-critical scenarios and offer limited controllability over the generated trajectories.<n>We propose RealDrive, a Retrieval-Augmented Generation framework that initializes a diffusion-based planning policy by retrieving the most relevant expert demonstrations from the training dataset.
arXiv Detail & Related papers (2025-05-30T17:15:03Z) - Adaptive Robust Optimization with Data-Driven Uncertainty for Enhancing Distribution System Resilience [6.325705102716997]
Extreme weather events are placing strain on electric power systems, exposing the limitations of purely reactive responses.<n>This paper proposes a novel tri-level optimization framework that integrates proactive infrastructure investment and reactive response.<n> Experiments on both real and synthetic data demonstrate that our approach consistently outperforms conventional two-stage methods.
arXiv Detail & Related papers (2025-05-16T18:43:31Z) - iEBAKER: Improved Remote Sensing Image-Text Retrieval Framework via Eliminate Before Align and Keyword Explicit Reasoning [80.44805667907612]
iEBAKER is an innovative strategy to filter weakly correlated sample pairs.<n>We introduce an alternative Sort After Reversed Retrieval (SAR) strategy.<n>We incorporate a Keyword Explicit Reasoning (KER) module to facilitate the beneficial impact of subtle key concept distinctions.
arXiv Detail & Related papers (2025-04-08T03:40:19Z) - Model-Based Offline Reinforcement Learning with Adversarial Data Augmentation [36.9134885948595]
We introduce Model-based Offline Reinforcement learning with AdversariaL data augmentation.<n>In MORAL, we replace the fixed horizon rollout by employing adversaria data augmentation to execute alternating sampling with ensemble models.<n>Experiments on D4RL benchmark demonstrate that MORAL outperforms other model-based offline RL methods in terms of policy learning and sample efficiency.
arXiv Detail & Related papers (2025-03-26T07:24:34Z) - Learning from Reward-Free Offline Data: A Case for Planning with Latent Dynamics Models [79.2162092822111]
We systematically evaluate reinforcement learning (RL) and control-based methods on a suite of navigation tasks.<n>We employ a latent dynamics model using the Joint Embedding Predictive Architecture (JEPA) and employ it for planning.<n>Our results show that model-free RL benefits most from large amounts of high-quality data, whereas model-based planning generalizes better to unseen layouts.
arXiv Detail & Related papers (2025-02-20T18:39:41Z) - ACTRESS: Active Retraining for Semi-supervised Visual Grounding [52.08834188447851]
A previous study, RefTeacher, makes the first attempt to tackle this task by adopting the teacher-student framework to provide pseudo confidence supervision and attention-based supervision.
This approach is incompatible with current state-of-the-art visual grounding models, which follow the Transformer-based pipeline.
Our paper proposes the ACTive REtraining approach for Semi-Supervised Visual Grounding, abbreviated as ACTRESS.
arXiv Detail & Related papers (2024-07-03T16:33:31Z) - Decision Mamba: A Multi-Grained State Space Model with Self-Evolution Regularization for Offline RL [57.202733701029594]
We propose Decision Mamba, a novel multi-grained state space model (SSM) with a self-evolving policy learning strategy.<n>To address these challenges, we propose Decision Mamba, a novel multi-grained state space model (SSM) with a self-evolving policy learning strategy.<n>To mitigate the overfitting issue on noisy trajectories, a self-evolving policy is proposed by using progressive regularization.
arXiv Detail & Related papers (2024-06-08T10:12:00Z) - Embedding Trajectory for Out-of-Distribution Detection in Mathematical Reasoning [50.84938730450622]
We propose a trajectory-based method TV score, which uses trajectory volatility for OOD detection in mathematical reasoning.
Our method outperforms all traditional algorithms on GLMs under mathematical reasoning scenarios.
Our method can be extended to more applications with high-density features in output spaces, such as multiple-choice questions.
arXiv Detail & Related papers (2024-05-22T22:22:25Z) - REX: Rapid Exploration and eXploitation for AI Agents [103.68453326880456]
We propose an enhanced approach for Rapid Exploration and eXploitation for AI Agents called REX.
REX introduces an additional layer of rewards and integrates concepts similar to Upper Confidence Bound (UCB) scores, leading to more robust and efficient AI agent performance.
arXiv Detail & Related papers (2023-07-18T04:26:33Z) - Robust Reinforcement Learning Objectives for Sequential Recommender Systems [7.44049827436013]
We develop recommender systems that incorporate direct user feedback in the form of rewards, enhancing personalization for users.
employing RL algorithms presents challenges, including off-policy training, expansive action spaces, and the scarcity of datasets with sufficient reward signals.
We introduce an enhanced methodology aimed at providing a more effective solution to these challenges.
arXiv Detail & Related papers (2023-05-30T08:09:08Z) - When Demonstrations Meet Generative World Models: A Maximum Likelihood
Framework for Offline Inverse Reinforcement Learning [62.00672284480755]
This paper aims to recover the structure of rewards and environment dynamics that underlie observed actions in a fixed, finite set of demonstrations from an expert agent.
Accurate models of expertise in executing a task has applications in safety-sensitive applications such as clinical decision making and autonomous driving.
arXiv Detail & Related papers (2023-02-15T04:14:20Z) - False Correlation Reduction for Offline Reinforcement Learning [115.11954432080749]
We propose falSe COrrelation REduction (SCORE) for offline RL, a practically effective and theoretically provable algorithm.
We empirically show that SCORE achieves the SoTA performance with 3.1x acceleration on various tasks in a standard benchmark (D4RL)
arXiv Detail & Related papers (2021-10-24T15:34:03Z) - PerSim: Data-Efficient Offline Reinforcement Learning with Heterogeneous
Agents via Personalized Simulators [19.026312915461553]
We propose a model-based offline reinforcement learning (RL) approach called PerSim.
We first learn a personalized simulator for each agent by collectively using the historical trajectories across all agents prior to learning a policy.
This representation suggests a simple, regularized neural network architecture to effectively learn the transition dynamics per agent, even with scarce, offline data.
arXiv Detail & Related papers (2021-02-13T17:16:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.