Adversarial Diffusion for Robust Reinforcement Learning
- URL: http://arxiv.org/abs/2509.23846v1
- Date: Sun, 28 Sep 2025 12:34:35 GMT
- Title: Adversarial Diffusion for Robust Reinforcement Learning
- Authors: Daniele Foffano, Alessio Russo, Alexandre Proutiere,
- Abstract summary: We introduce Adversarial Diffusion for Robust Reinforcement Learning (AD-RRL)<n>AD-RRL guides the diffusion process to generate worst-case trajectories during training, effectively optimizing the Conditional Value at Risk (CVaR) of the cumulative return.<n> Empirical results across standard benchmarks show that AD-RRL achieves superior robustness and performance compared to existing robust RL methods.
- Score: 46.44328012099217
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Robustness to modeling errors and uncertainties remains a central challenge in reinforcement learning (RL). In this work, we address this challenge by leveraging diffusion models to train robust RL policies. Diffusion models have recently gained popularity in model-based RL due to their ability to generate full trajectories "all at once", mitigating the compounding errors typical of step-by-step transition models. Moreover, they can be conditioned to sample from specific distributions, making them highly flexible. We leverage conditional sampling to learn policies that are robust to uncertainty in environment dynamics. Building on the established connection between Conditional Value at Risk (CVaR) optimization and robust RL, we introduce Adversarial Diffusion for Robust Reinforcement Learning (AD-RRL). AD-RRL guides the diffusion process to generate worst-case trajectories during training, effectively optimizing the CVaR of the cumulative return. Empirical results across standard benchmarks show that AD-RRL achieves superior robustness and performance compared to existing robust RL methods.
Related papers
- Data-regularized Reinforcement Learning for Diffusion Models at Scale [99.01056178660538]
We introduce Data-regularized Diffusion Reinforcement Learning ( DDRL), a novel framework that uses the forward KL divergence to anchor the policy to an off-policy data distribution.<n>With over a million GPU hours of experiments and ten thousand double-blind evaluations, we demonstrate that DDRL significantly improves rewards while alleviating the reward hacking seen in RLs.
arXiv Detail & Related papers (2025-12-03T23:45:07Z) - Enhancing Diffusion-based Restoration Models via Difficulty-Adaptive Reinforcement Learning with IQA Reward [93.04811239892852]
Reinforcement Learning (RL) has recently been incorporated into diffusion models.<n>In this paper, we investigate how to effectively integrate RL into diffusion-based restoration models.
arXiv Detail & Related papers (2025-11-03T14:57:57Z) - Towards Stable and Effective Reinforcement Learning for Mixture-of-Experts [113.0656076371565]
We propose a novel router-aware approach to optimize importance sampling weights in off-policy reinforcement learning (RL)<n>Specifically, we design a rescaling strategy guided by router logits, which effectively reduces gradient variance and mitigates training divergence.<n> Experimental results demonstrate that our method significantly improves both the convergence stability and the final performance of MoE models.
arXiv Detail & Related papers (2025-10-27T05:47:48Z) - Towards Robust Zero-Shot Reinforcement Learning [22.262048244005296]
Recent development of zero-shot reinforcement learning (RL) has opened a new avenue for learning pre-trained generalist policies that can adapt to arbitrary new tasks in a zero-shot manner.<n>While the popular Forward-Backward representations (FB) and related methods have shown promise in zero-shot RL, we empirically found that their modeling lacks expressivity and that extrapolation errors caused suboptimal performance.<n>We propose an upgraded FB-based framework that simultaneously enhances learning stability, policy extraction capability, and representation learning quality.
arXiv Detail & Related papers (2025-10-17T07:33:19Z) - RL-PLUS: Countering Capability Boundary Collapse of LLMs in Reinforcement Learning with Hybrid-policy Optimization [86.30192066451256]
We propose RL-PLUS, a novel hybrid-policy optimization approach for Large Language Models (LLMs)<n> RL-PLUS synergizes internal exploitation with external data to achieve stronger reasoning capabilities and surpass the boundaries of base models.<n>We provide both theoretical analysis and extensive experiments to demonstrate the superiority and generalizability of our approach.
arXiv Detail & Related papers (2025-07-31T23:55:29Z) - Distributionally Robust Learning in Survival Analysis [6.946903076677842]
We introduce an innovative approach that incorporates a Distributionally Robust Learning (DRL) approach into Cox regression.<n>By formulating a DRL framework with a Wasserstein distance-based ambiguity set, we develop a variant Cox model that is less sensitive to assumptions about the underlying data distribution.<n>We demonstrate that our regression model achieves superior performance in terms of prediction accuracy and robustness compared with traditional methods.
arXiv Detail & Related papers (2025-06-02T06:11:22Z) - VARD: Efficient and Dense Fine-Tuning for Diffusion Models with Value-based RL [28.95582264086289]
VAlue-based Reinforced Diffusion (VARD) is a novel approach that first learns a value function predicting expection of rewards from intermediate states.<n>Our method maintains proximity to the pretrained model while enabling effective and stable training via backpropagation.
arXiv Detail & Related papers (2025-05-21T17:44:37Z) - Score as Action: Fine-Tuning Diffusion Generative Models by Continuous-time Reinforcement Learning [15.789898162610529]
Reinforcement learning from human feedback (RLHF) has become a crucial step in building reliable generative AI models.<n>This study is to develop a disciplined approach to fine-tune diffusion models using continuous-time RL.
arXiv Detail & Related papers (2025-02-03T20:50:05Z) - Avoiding mode collapse in diffusion models fine-tuned with reinforcement learning [0.0]
Fine-tuning foundation models via reinforcement learning (RL) has proven promising for aligning to downstream objectives.
We exploit the hierarchical nature of diffusion models (DMs) and train them dynamically at each epoch with a tailored RL method.
We show that models trained with HRF achieve better preservation of diversity in downstream tasks, thus enhancing the fine-tuning robustness and at uncompromising mean rewards.
arXiv Detail & Related papers (2024-10-10T19:06:23Z) - Take the Bull by the Horns: Hard Sample-Reweighted Continual Training
Improves LLM Generalization [165.98557106089777]
A key challenge is to enhance the capabilities of large language models (LLMs) amid a looming shortage of high-quality training data.
Our study starts from an empirical strategy for the light continual training of LLMs using their original pre-training data sets.
We then formalize this strategy into a principled framework of Instance-Reweighted Distributionally Robust Optimization.
arXiv Detail & Related papers (2024-02-22T04:10:57Z) - Hybrid Reinforcement Learning for Optimizing Pump Sustainability in
Real-World Water Distribution Networks [55.591662978280894]
This article addresses the pump-scheduling optimization problem to enhance real-time control of real-world water distribution networks (WDNs)
Our primary objectives are to adhere to physical operational constraints while reducing energy consumption and operational costs.
Traditional optimization techniques, such as evolution-based and genetic algorithms, often fall short due to their lack of convergence guarantees.
arXiv Detail & Related papers (2023-10-13T21:26:16Z) - Combining Pessimism with Optimism for Robust and Efficient Model-Based
Deep Reinforcement Learning [56.17667147101263]
In real-world tasks, reinforcement learning agents encounter situations that are not present during training time.
To ensure reliable performance, the RL agents need to exhibit robustness against worst-case situations.
We propose the Robust Hallucinated Upper-Confidence RL (RH-UCRL) algorithm to provably solve this problem.
arXiv Detail & Related papers (2021-03-18T16:50:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.