Exploration and Adaptation in Non-Stationary Tasks with Diffusion Policies
- URL: http://arxiv.org/abs/2504.00280v1
- Date: Mon, 31 Mar 2025 23:00:07 GMT
- Title: Exploration and Adaptation in Non-Stationary Tasks with Diffusion Policies
- Authors: Gunbir Singh Baveja,
- Abstract summary: This paper investigates the application of Diffusion Policy in non-stationary, vision-based RL settings, specifically targeting environments where task dynamics and objectives evolve over time.<n>We apply Diffusion Policy -- which leverages iterative denoising to refine latent action representations-to benchmark environments including Procgen and PointMaze.<n>Our experiments demonstrate that, despite increased computational demands, Diffusion Policy consistently outperforms standard RL methods such as PPO and DQN, achieving higher mean and maximum rewards with reduced variability.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper investigates the application of Diffusion Policy in non-stationary, vision-based RL settings, specifically targeting environments where task dynamics and objectives evolve over time. Our work is grounded in practical challenges encountered in dynamic real-world scenarios such as robotics assembly lines and autonomous navigation, where agents must adapt control strategies from high-dimensional visual inputs. We apply Diffusion Policy -- which leverages iterative stochastic denoising to refine latent action representations-to benchmark environments including Procgen and PointMaze. Our experiments demonstrate that, despite increased computational demands, Diffusion Policy consistently outperforms standard RL methods such as PPO and DQN, achieving higher mean and maximum rewards with reduced variability. These findings underscore the approach's capability to generate coherent, contextually relevant action sequences in continuously shifting conditions, while also highlighting areas for further improvement in handling extreme non-stationarity.
Related papers
- Zero-Shot Whole-Body Humanoid Control via Behavioral Foundation Models [71.34520793462069]
Unsupervised reinforcement learning (RL) aims at pre-training agents that can solve a wide range of downstream tasks in complex environments.
We introduce a novel algorithm regularizing unsupervised RL towards imitating trajectories from unlabeled behavior datasets.
We demonstrate the effectiveness of this new approach in a challenging humanoid control problem.
arXiv Detail & Related papers (2025-04-15T10:41:11Z) - Scalable Decision-Making in Stochastic Environments through Learned Temporal Abstraction [7.918703013303246]
We present Latent Macro Action Planner (L-MAP), which addresses the challenge of learning to make decisions in high-dimensional continuous action spaces.
L-MAP learns a set of temporally extended macro-actions through a state-conditional Vector Quantized Variational Autoencoder (VQ-VAE)
In offline RL settings, including continuous control tasks, L-MAP efficiently searches over discrete latent actions to yield high expected returns.
arXiv Detail & Related papers (2025-02-28T16:02:23Z) - Diffusion-based Reinforcement Learning via Q-weighted Variational Policy Optimization [55.97310586039358]
Diffusion models have garnered widespread attention in Reinforcement Learning (RL) for their powerful expressiveness and multimodality.<n>We propose a novel model-free diffusion-based online RL algorithm, Q-weighted Variational Policy Optimization (QVPO)<n>Specifically, we introduce the Q-weighted variational loss, which can be proved to be a tight lower bound of the policy objective in online RL under certain conditions.<n>We also develop an efficient behavior policy to enhance sample efficiency by reducing the variance of the diffusion policy during online interactions.
arXiv Detail & Related papers (2024-05-25T10:45:46Z) - Entropy-Regularized Token-Level Policy Optimization for Language Agent Reinforcement [67.1393112206885]
Large Language Models (LLMs) have shown promise as intelligent agents in interactive decision-making tasks.
We introduce Entropy-Regularized Token-level Policy Optimization (ETPO), an entropy-augmented RL method tailored for optimizing LLMs at the token level.
We assess the effectiveness of ETPO within a simulated environment that models data science code generation as a series of multi-step interactive tasks.
arXiv Detail & Related papers (2024-02-09T07:45:26Z) - A Conservative Approach for Few-Shot Transfer in Off-Dynamics Reinforcement Learning [3.1515473193934778]
Off-dynamics Reinforcement Learning seeks to transfer a policy from a source environment to a target environment characterized by distinct yet similar dynamics.
We propose an innovative approach inspired by recent advancements in Imitation Learning and conservative RL algorithms.
arXiv Detail & Related papers (2023-12-24T13:09:08Z) - Reparameterized Policy Learning for Multimodal Trajectory Optimization [61.13228961771765]
We investigate the challenge of parametrizing policies for reinforcement learning in high-dimensional continuous action spaces.
We propose a principled framework that models the continuous RL policy as a generative model of optimal trajectories.
We present a practical model-based RL method, which leverages the multimodal policy parameterization and learned world model.
arXiv Detail & Related papers (2023-07-20T09:05:46Z) - Robust Policy Optimization in Deep Reinforcement Learning [16.999444076456268]
In continuous action domains, parameterized distribution of action distribution allows easy control of exploration.
In particular, we propose an algorithm called Robust Policy Optimization (RPO), which leverages a perturbed distribution.
We evaluated our methods on various continuous control tasks from DeepMind Control, OpenAI Gym, Pybullet, and IsaacGym.
arXiv Detail & Related papers (2022-12-14T22:43:56Z) - Diffusion Policies as an Expressive Policy Class for Offline
Reinforcement Learning [70.20191211010847]
Offline reinforcement learning (RL) aims to learn an optimal policy using a previously collected static dataset.
We introduce Diffusion Q-learning (Diffusion-QL) that utilizes a conditional diffusion model to represent the policy.
We show that our method can achieve state-of-the-art performance on the majority of the D4RL benchmark tasks.
arXiv Detail & Related papers (2022-08-12T09:54:11Z) - Variational Dynamic for Self-Supervised Exploration in Deep Reinforcement Learning [12.76337275628074]
In this work, we propose a variational dynamic model based on the conditional variational inference to model the multimodality andgenerativeity.
We derive an upper bound of the negative log-likelihood of the environmental transition and use such an upper bound as the intrinsic reward for exploration.
Our method outperforms several state-of-the-art environment model-based exploration approaches.
arXiv Detail & Related papers (2020-10-17T09:54:51Z) - Discrete Action On-Policy Learning with Action-Value Critic [72.20609919995086]
Reinforcement learning (RL) in discrete action space is ubiquitous in real-world applications, but its complexity grows exponentially with the action-space dimension.
We construct a critic to estimate action-value functions, apply it on correlated actions, and combine these critic estimated action values to control the variance of gradient estimation.
These efforts result in a new discrete action on-policy RL algorithm that empirically outperforms related on-policy algorithms relying on variance control techniques.
arXiv Detail & Related papers (2020-02-10T04:23:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.