Real-Time Diffusion Policies for Games: Enhancing Consistency Policies with Q-Ensembles
- URL: http://arxiv.org/abs/2503.16978v1
- Date: Fri, 21 Mar 2025 09:45:59 GMT
- Title: Real-Time Diffusion Policies for Games: Enhancing Consistency Policies with Q-Ensembles
- Authors: Ruoqi Zhang, Ziwei Luo, Jens Sjölund, Per Mattsson, Linus Gisslén, Alessandro Sestini,
- Abstract summary: We present CPQE (Consistency Policy with Q-ensembles), which combines consistency models with Q-ensembles to address policy learning challenges.<n>CPQE achieves inference speeds of up to 60 Hz, a significant improvement over state-of-the-art diffusion policies that operate at only 20 Hz.<n>These results indicate that CPQE offers a practical solution for deploying diffusion-based policies in games and other real-time applications.
- Score: 43.11094512130205
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Diffusion models have shown impressive performance in capturing complex and multi-modal action distributions for game agents, but their slow inference speed prevents practical deployment in real-time game environments. While consistency models offer a promising approach for one-step generation, they often suffer from training instability and performance degradation when applied to policy learning. In this paper, we present CPQE (Consistency Policy with Q-Ensembles), which combines consistency models with Q-ensembles to address these challenges.CPQE leverages uncertainty estimation through Q-ensembles to provide more reliable value function approximations, resulting in better training stability and improved performance compared to classic double Q-network methods. Our extensive experiments across multiple game scenarios demonstrate that CPQE achieves inference speeds of up to 60 Hz -- a significant improvement over state-of-the-art diffusion policies that operate at only 20 Hz -- while maintaining comparable performance to multi-step diffusion approaches. CPQE consistently outperforms state-of-the-art consistency model approaches, showing both higher rewards and enhanced training stability throughout the learning process. These results indicate that CPQE offers a practical solution for deploying diffusion-based policies in games and other real-time applications where both multi-modal behavior modeling and rapid inference are critical requirements.
Related papers
- Score and Distribution Matching Policy: Advanced Accelerated Visuomotor Policies via Matched Distillation [29.90613565503628]
We propose the Score and Distribution Matching Policy (SDM Policy) for visual-motor policy learning.<n>SDM Policy transforms diffusion-based policies into single-step generators through a two-stage optimization process.<n>It achieves a 6x inference speedup while having state-of-the-art action quality.
arXiv Detail & Related papers (2024-12-12T13:22:02Z) - Distributing Quantum Computations, Shot-wise [1.2061873132374783]
NISQ era constraints, high sensitivity to noise and limited qubit count, impose significant barriers on the usability of QPUs.
We propose a methodological framework, termed shot-wise, which enables the distribution of shots for a single circuit across multiple QPUs.
arXiv Detail & Related papers (2024-11-25T16:16:54Z) - Diffusion-based Reinforcement Learning via Q-weighted Variational Policy Optimization [55.97310586039358]
Diffusion models have garnered widespread attention in Reinforcement Learning (RL) for their powerful expressiveness and multimodality.<n>We propose a novel model-free diffusion-based online RL algorithm, Q-weighted Variational Policy Optimization (QVPO)<n>Specifically, we introduce the Q-weighted variational loss, which can be proved to be a tight lower bound of the policy objective in online RL under certain conditions.<n>We also develop an efficient behavior policy to enhance sample efficiency by reducing the variance of the diffusion policy during online interactions.
arXiv Detail & Related papers (2024-05-25T10:45:46Z) - Learning a Diffusion Model Policy from Rewards via Q-Score Matching [93.0191910132874]
We present a theoretical framework linking the structure of diffusion model policies to a learned Q-function.<n>We propose a new policy update method from this theory, which we denote Q-score matching.
arXiv Detail & Related papers (2023-12-18T23:31:01Z) - Projected Off-Policy Q-Learning (POP-QL) for Stabilizing Offline
Reinforcement Learning [57.83919813698673]
Projected Off-Policy Q-Learning (POP-QL) is a novel actor-critic algorithm that simultaneously reweights off-policy samples and constrains the policy to prevent divergence and reduce value-approximation error.
In our experiments, POP-QL not only shows competitive performance on standard benchmarks, but also out-performs competing methods in tasks where the data-collection policy is significantly sub-optimal.
arXiv Detail & Related papers (2023-11-25T00:30:58Z) - Boosting Continuous Control with Consistency Policy [14.78980095597872]
We propose a novel time-efficiency method named Consistency Policy with Q-Learning (CPQL)
By establishing a mapping from the reverse diffusion trajectories to the desired policy, we simultaneously address the issues of time efficiency and inaccurate guidance.
CPQL achieves new state-of-the-art performance on 11 offline and 21 online tasks, significantly improving inference speed by nearly 45 times compared to Diffusion-QL.
arXiv Detail & Related papers (2023-10-10T06:26:05Z) - Benchmarking the Reliability of Post-training Quantization: a Particular
Focus on Worst-case Performance [53.45700148820669]
Post-training quantization (PTQ) is a popular method for compressing deep neural networks (DNNs) without modifying their original architecture or training procedures.
Despite its effectiveness and convenience, the reliability of PTQ methods in the presence of some extrem cases such as distribution shift and data noise remains largely unexplored.
This paper first investigates this problem on various commonly-used PTQ methods.
arXiv Detail & Related papers (2023-03-23T02:55:50Z) - Diffusion Policies as an Expressive Policy Class for Offline
Reinforcement Learning [70.20191211010847]
Offline reinforcement learning (RL) aims to learn an optimal policy using a previously collected static dataset.
We introduce Diffusion Q-learning (Diffusion-QL) that utilizes a conditional diffusion model to represent the policy.
We show that our method can achieve state-of-the-art performance on the majority of the D4RL benchmark tasks.
arXiv Detail & Related papers (2022-08-12T09:54:11Z) - Residual Q-Networks for Value Function Factorizing in Multi-Agent
Reinforcement Learning [0.0]
We propose a novel concept of Residual Q-Networks (RQNs) for Multi-Agent Reinforcement Learning (MARL)
The RQN learns to transform the individual Q-value trajectories in a way that preserves the Individual-Global-Max criteria (IGM)
The proposed method converges faster, with increased stability and shows robust performance in a wider family of environments.
arXiv Detail & Related papers (2022-05-30T16:56:06Z) - Aggressive Q-Learning with Ensembles: Achieving Both High Sample
Efficiency and High Asymptotic Performance [12.871109549160389]
We propose a novel model-free algorithm, Aggressive Q-Learning with Ensembles (AQE), which improves the sample-efficiency performance of REDQ and the performance of TQC.
AQE is very simple, requiring neither distributional representation of critics nor target randomization.
arXiv Detail & Related papers (2021-11-17T14:48:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.