Distributional Reinforcement Learning with Diffusion Bridge Critics
- URL: http://arxiv.org/abs/2602.05783v1
- Date: Thu, 05 Feb 2026 15:40:14 GMT
- Title: Distributional Reinforcement Learning with Diffusion Bridge Critics
- Authors: Shutong Ding, Yimiao Zhou, Ke Hu, Mokai Pan, Shan Zhong, Yanwei Fu, Jingya Wang, Ye Shi,
- Abstract summary: We propose a novel distributional reinforcement learning method with Diffusion Bridge Critics (DBC)<n>DBC directly models the inverse cumulative distribution function (CDF) of the Q value.<n>We derive an analytic integral formula to address discretization errors in DBC.
- Score: 57.70134665595571
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in diffusion-based reinforcement learning (RL) methods have demonstrated promising results in a wide range of continuous control tasks. However, existing works in this field focus on the application of diffusion policies while leaving the diffusion critics unexplored. In fact, since policy optimization fundamentally relies on the critic, accurate value estimation is far more important than policy expressiveness. Furthermore, given the stochasticity of most reinforcement learning tasks, it has been confirmed that the critic is more appropriately depicted with a distributional model. Motivated by these points, we propose a novel distributional RL method with Diffusion Bridge Critics (DBC). DBC directly models the inverse cumulative distribution function (CDF) of the Q value. This allows us to accurately capture the value distribution and prevents it from collapsing into a trivial Gaussian distribution owing to the strong distribution-matching capability of the diffusion bridge. Moreover, we further derive an analytic integral formula to address discretization errors in DBC, which is essential in value estimation. To our knowledge, DBC is the first work to employ the diffusion bridge model as the critic. Notably, DBC is also a plug-and-play component and can be integrated into most existing RL frameworks. Experimental results on MuJoCo robot control benchmarks demonstrate the superiority of DBC compared with previous distributional critic models.
Related papers
- Reverse Flow Matching: A Unified Framework for Online Reinforcement Learning with Diffusion and Flow Policies [4.249024052507976]
We propose a unified framework, reverse flow matching (RFM), which rigorously addresses the problem of training diffusion and flow models without direct target samples.<n>By adopting a reverse inferential perspective, we formulate the training target as a posterior mean estimation problem given an intermediate noisy sample.<n>We show that existing noise-expectation and gradient-expectation methods are two specific instances within this broader class.
arXiv Detail & Related papers (2026-01-13T01:58:24Z) - Score-based Membership Inference on Diffusion Models [3.742113529511043]
Membership inference attacks (MIAs) against diffusion models have emerged as a pressing privacy concern.<n>We present a theoretical and empirical study of score-based MIAs, focusing on the predicted noise vectors that diffusion models learn to approximate.<n>We show that the expected denoiser output points toward a kernel-weighted local mean of nearby training samples, such that its norm encodes proximity to the training set and thereby reveals membership.
arXiv Detail & Related papers (2025-09-29T16:28:55Z) - Prior-Guided Residual Diffusion: Calibrated and Efficient Medical Image Segmentation [11.375625987308927]
Prior-Guided Residual Diffusion (PGRD) is a diffusion-based framework that learns voxel-wise distributions.<n> evaluated on representative MRI and CT datasets.
arXiv Detail & Related papers (2025-09-01T10:13:15Z) - Duality and Policy Evaluation in Distributionally Robust Bayesian Diffusion Control [8.863520091178335]
We consider a diffusion control problem of expected terminal numerical utility.<n>The controller imposes a prior distribution on the unknown drift of an underlying diffusion.<n>In practice, the prior will generally be incorrectly specified, and the degree of model misspecification can have a significant impact on policy performance.<n>We introduce a distributionally robust Bayesian control (DRBC) formulation in which the controller plays a game against an adversary who selects a prior in divergence neighborhood of a baseline prior.
arXiv Detail & Related papers (2025-06-24T03:58:49Z) - Theory on Score-Mismatched Diffusion Models and Zero-Shot Conditional Samplers [49.97755400231656]
We present the first performance guarantee with explicit dimensional dependencies for general score-mismatched diffusion samplers.<n>We show that score mismatches result in an distributional bias between the target and sampling distributions, proportional to the accumulated mismatch between the target and training distributions.<n>This result can be directly applied to zero-shot conditional samplers for any conditional model, irrespective of measurement noise.
arXiv Detail & Related papers (2024-10-17T16:42:12Z) - ADBM: Adversarial diffusion bridge model for reliable adversarial purification [21.2538921336578]
Recently Diffusion-based Purification (DiffPure) has been recognized as an effective defense method against adversarial examples.<n>We find DiffPure which directly employs the original pre-trained diffusion models for adversarial purification to be suboptimal.<n>We propose a novel Adrialversa Diffusion Bridge Model, termed ADBM, which constructs a reverse bridge from diffused adversarial data back to its original clean examples.
arXiv Detail & Related papers (2024-08-01T06:26:05Z) - Towards Understanding the Robustness of Diffusion-Based Purification: A Stochastic Perspective [65.10019978876863]
Diffusion-Based Purification (DBP) has emerged as an effective defense mechanism against adversarial attacks.<n>In this paper, we propose that the intrinsicity in the DBP process is the primary factor driving robustness.
arXiv Detail & Related papers (2024-04-22T16:10:38Z) - Guided Diffusion from Self-Supervised Diffusion Features [49.78673164423208]
Guidance serves as a key concept in diffusion models, yet its effectiveness is often limited by the need for extra data annotation or pretraining.
We propose a framework to extract guidance from, and specifically for, diffusion models.
arXiv Detail & Related papers (2023-12-14T11:19:11Z) - DiffCPS: Diffusion Model based Constrained Policy Search for Offline
Reinforcement Learning [11.678012836760967]
Constrained policy search is a fundamental problem in offline reinforcement learning.
We propose a novel approach, $textbfDiffusion-based Constrained Policy Search$ (dubbed DiffCPS)
arXiv Detail & Related papers (2023-10-09T01:29:17Z) - How Much is Enough? A Study on Diffusion Times in Score-based Generative
Models [76.76860707897413]
Current best practice advocates for a large T to ensure that the forward dynamics brings the diffusion sufficiently close to a known and simple noise distribution.
We show how an auxiliary model can be used to bridge the gap between the ideal and the simulated forward dynamics, followed by a standard reverse diffusion process.
arXiv Detail & Related papers (2022-06-10T15:09:46Z) - Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios.
We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.