Off-policy Reinforcement Learning with Model-based Exploration Augmentation
- URL: http://arxiv.org/abs/2510.25529v1
- Date: Wed, 29 Oct 2025 13:53:52 GMT
- Title: Off-policy Reinforcement Learning with Model-based Exploration Augmentation
- Authors: Likun Wang, Xiangteng Zhang, Yinuo Wang, Guojian Zhan, Wenxuan Wang, Haoyu Gao, Jingliang Duan, Shengbo Eben Li,
- Abstract summary: We propose Modelic Generative Exploration (MoGE), which augments exploration through the generation of under-explored critical states.<n>MoGE is composed of two components: (1) a diffusion-based generator that synthesizes critical states under the guidance of a utility function evaluating each state's potential influence on policy exploration, and (2) a one-step imagination world model for constructing critical transitions based on the critical states for agent learning.<n>Our method adopts a modular formulation that aligns with the principles of off-policy learning, allowing seamless integration with existing algorithms to improve exploration without altering their core structures.
- Score: 29.61835214523957
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Exploration is fundamental to reinforcement learning (RL), as it determines how effectively an agent discovers and exploits the underlying structure of its environment to achieve optimal performance. Existing exploration methods generally fall into two categories: active exploration and passive exploration. The former introduces stochasticity into the policy but struggles in high-dimensional environments, while the latter adaptively prioritizes transitions in the replay buffer to enhance exploration, yet remains constrained by limited sample diversity. To address the limitation in passive exploration, we propose Modelic Generative Exploration (MoGE), which augments exploration through the generation of under-explored critical states and synthesis of dynamics-consistent experiences through transition models. MoGE is composed of two components: (1) a diffusion-based generator that synthesizes critical states under the guidance of a utility function evaluating each state's potential influence on policy exploration, and (2) a one-step imagination world model for constructing critical transitions based on the critical states for agent learning. Our method adopts a modular formulation that aligns with the principles of off-policy learning, allowing seamless integration with existing algorithms to improve exploration without altering their core structures. Empirical results on OpenAI Gym and DeepMind Control Suite reveal that MoGE effectively bridges exploration and policy learning, leading to remarkable gains in both sample efficiency and performance across complex control tasks.
Related papers
- Search Inspired Exploration in Reinforcement Learning [5.411688702405822]
We propose a novel method that actively guides exploration by setting sub-goals based on the agent's learning progress.<n>Inspired by search, sub-goals are prioritized from the frontier based on estimates of cost-to-come and cost-to-go.<n>In experiments on challenging sparse-reward environments, SIERL outperforms dominant baselines in both achieving the main task goal and generalizing to reach arbitrary states in the environment.
arXiv Detail & Related papers (2026-01-31T02:24:22Z) - Improving Deepfake Detection with Reinforcement Learning-Based Adaptive Data Augmentation [60.04281435591454]
CRDA (Curriculum Reinforcement-Learning Data Augmentation) is a novel framework guiding detectors to progressively master multi-domain forgery features.<n>Central to our approach is integrating reinforcement learning and causal inference.<n>Our method significantly improves detector generalizability, outperforming SOTA methods across multiple cross-domain datasets.
arXiv Detail & Related papers (2025-11-10T12:45:52Z) - SOE: Sample-Efficient Robot Policy Self-Improvement via On-Manifold Exploration [58.05143960563826]
On-Manifold Exploration (SOE) is a framework that enhances policy exploration and improvement in robotic manipulation.<n>SOE learns a compact latent representation of task-relevant factors and constrains exploration to the manifold of valid actions.<n>It can be seamlessly integrated with arbitrary policy models as a plug-in module, augmenting exploration without degrading the base policy performance.
arXiv Detail & Related papers (2025-09-23T17:54:47Z) - The Landscape of Agentic Reinforcement Learning for LLMs: A Survey [103.78397717362797]
The emergence of agentic reinforcement learning (Agentic RL) marks a paradigm shift from conventional reinforcement learning applied to large language models (LLM RL)<n>This survey formalizes this conceptual shift by contrasting the degenerate single-step Markov Decision Processes (MDPs) of LLM-RL with the temporally extended, partially observable Markov decision processes (POMDPs) that define Agentic RL.
arXiv Detail & Related papers (2025-09-02T17:46:26Z) - Action abstractions for amortized sampling [49.384037138511246]
We propose an approach to incorporate the discovery of action abstractions, or high-level actions, into the policy optimization process.
Our approach involves iteratively extracting action subsequences commonly used across many high-reward trajectories and chunking' them into a single action that is added to the action space.
arXiv Detail & Related papers (2024-10-19T19:22:50Z) - Variable-Agnostic Causal Exploration for Reinforcement Learning [56.52768265734155]
We introduce a novel framework, Variable-Agnostic Causal Exploration for Reinforcement Learning (VACERL)
Our approach automatically identifies crucial observation-action steps associated with key variables using attention mechanisms.
It constructs the causal graph connecting these steps, which guides the agent towards observation-action pairs with greater causal influence on task completion.
arXiv Detail & Related papers (2024-07-17T09:45:27Z) - Imagine, Initialize, and Explore: An Effective Exploration Method in
Multi-Agent Reinforcement Learning [27.81925751697255]
We propose a novel method for efficient multi-agent exploration in complex scenarios.
We formulate the imagination as a sequence modeling problem, where the states, observations, prompts, actions, and rewards are predicted autoregressively.
By initializing agents at the critical states, IIE significantly increases the likelihood of discovering potentially important underexplored regions.
arXiv Detail & Related papers (2024-02-28T01:45:01Z) - Efficient Reinforcement Learning via Decoupling Exploration and Utilization [6.305976803910899]
Reinforcement Learning (RL) has achieved remarkable success across multiple fields and applications, including gaming, robotics, and autonomous vehicles.
In this work, our aim is to train agent with efficient learning by decoupling exploration and utilization, so that agent can escaping the conundrum of suboptimal Solutions.
The above idea is implemented in the proposed OPARL (Optimistic and Pessimistic Actor Reinforcement Learning) algorithm.
arXiv Detail & Related papers (2023-12-26T09:03:23Z) - Dealing with uncertainty: balancing exploration and exploitation in deep
recurrent reinforcement learning [0.0]
Incomplete knowledge of the environment leads an agent to make decisions under uncertainty.
One of the major dilemmas in Reinforcement Learning (RL) where an autonomous agent has to balance two contrasting needs in making its decisions.
We show that adaptive methods better approximate the trade-off between exploration and exploitation.
arXiv Detail & Related papers (2023-10-12T13:45:33Z) - On the Importance of Exploration for Generalization in Reinforcement
Learning [89.63074327328765]
We propose EDE: Exploration via Distributional Ensemble, a method that encourages exploration of states with high uncertainty.
Our algorithm is the first value-based approach to achieve state-of-the-art on both Procgen and Crafter.
arXiv Detail & Related papers (2023-06-08T18:07:02Z) - Deep Intrinsically Motivated Exploration in Continuous Control [0.0]
In continuous systems, exploration is often performed through undirected strategies in which parameters of the networks or selected actions are perturbed by random noise.
We adapt existing theories on animal motivational systems into the reinforcement learning paradigm and introduce a novel directed exploration strategy.
Our framework extends to larger and more diverse state spaces, dramatically improves the baselines, and outperforms the undirected strategies significantly.
arXiv Detail & Related papers (2022-10-01T14:52:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.