Consistency Models as a Rich and Efficient Policy Class for Reinforcement Learning
- URL: http://arxiv.org/abs/2309.16984v2
- Date: Thu, 14 Mar 2024 19:28:48 GMT
- Title: Consistency Models as a Rich and Efficient Policy Class for Reinforcement Learning
- Authors: Zihan Ding, Chi Jin,
- Abstract summary: Score-based generative models like the diffusion model have been testified to be effective in modeling multi-modal data from image generation to reinforcement learning (RL)
We propose to apply the consistency model as an efficient yet expressive policy representation, namely consistency policy, with an actor-critic style algorithm for three typical RL settings.
- Score: 25.81859481634996
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Score-based generative models like the diffusion model have been testified to be effective in modeling multi-modal data from image generation to reinforcement learning (RL). However, the inference process of diffusion model can be slow, which hinders its usage in RL with iterative sampling. We propose to apply the consistency model as an efficient yet expressive policy representation, namely consistency policy, with an actor-critic style algorithm for three typical RL settings: offline, offline-to-online and online. For offline RL, we demonstrate the expressiveness of generative models as policies from multi-modal data. For offline-to-online RL, the consistency policy is shown to be more computational efficient than diffusion policy, with a comparable performance. For online RL, the consistency policy demonstrates significant speedup and even higher average performances than the diffusion policy.
Related papers
- Learning from Random Demonstrations: Offline Reinforcement Learning with Importance-Sampled Diffusion Models [19.05224410249602]
We propose a novel approach for offline reinforcement learning with closed-loop policy evaluation and world-model adaptation.
We analyzed the performance of the proposed method and provided an upper bound on the return gap between our method and the real environment under an optimal policy.
arXiv Detail & Related papers (2024-05-30T09:34:31Z) - Preferred-Action-Optimized Diffusion Policies for Offline Reinforcement Learning [19.533619091287676]
We propose a novel preferred-action-optimized diffusion policy for offline reinforcement learning.
In particular, an expressive conditional diffusion model is utilized to represent the diverse distribution of a behavior policy.
Experiments demonstrate that the proposed method provides competitive or superior performance compared to previous state-of-the-art offline RL methods.
arXiv Detail & Related papers (2024-05-29T03:19:59Z) - Policy Representation via Diffusion Probability Model for Reinforcement
Learning [67.56363353547775]
We build a theoretical foundation of policy representation via the diffusion probability model.
We present a convergence guarantee for diffusion policy, which provides a theory to understand the multimodality of diffusion policy.
We propose the DIPO which is an implementation for model-free online RL with DIffusion POlicy.
arXiv Detail & Related papers (2023-05-22T15:23:41Z) - A Unified Framework for Alternating Offline Model Training and Policy
Learning [62.19209005400561]
In offline model-based reinforcement learning, we learn a dynamic model from historically collected data, and utilize the learned model and fixed datasets for policy learning.
We develop an iterative offline MBRL framework, where we maximize a lower bound of the true expected return.
With the proposed unified model-policy learning framework, we achieve competitive performance on a wide range of continuous-control offline reinforcement learning datasets.
arXiv Detail & Related papers (2022-10-12T04:58:51Z) - Diffusion Policies as an Expressive Policy Class for Offline
Reinforcement Learning [70.20191211010847]
Offline reinforcement learning (RL) aims to learn an optimal policy using a previously collected static dataset.
We introduce Diffusion Q-learning (Diffusion-QL) that utilizes a conditional diffusion model to represent the policy.
We show that our method can achieve state-of-the-art performance on the majority of the D4RL benchmark tasks.
arXiv Detail & Related papers (2022-08-12T09:54:11Z) - Behavioral Priors and Dynamics Models: Improving Performance and Domain
Transfer in Offline RL [82.93243616342275]
We introduce Offline Model-based RL with Adaptive Behavioral Priors (MABE)
MABE is based on the finding that dynamics models, which support within-domain generalization, and behavioral priors, which support cross-domain generalization, are complementary.
In experiments that require cross-domain generalization, we find that MABE outperforms prior methods.
arXiv Detail & Related papers (2021-06-16T20:48:49Z) - MOPO: Model-based Offline Policy Optimization [183.6449600580806]
offline reinforcement learning (RL) refers to the problem of learning policies entirely from a large batch of previously collected data.
We show that an existing model-based RL algorithm already produces significant gains in the offline setting.
We propose to modify the existing model-based RL methods by applying them with rewards artificially penalized by the uncertainty of the dynamics.
arXiv Detail & Related papers (2020-05-27T08:46:41Z) - MOReL : Model-Based Offline Reinforcement Learning [49.30091375141527]
In offline reinforcement learning (RL), the goal is to learn a highly rewarding policy based solely on a dataset of historical interactions with the environment.
We present MOReL, an algorithmic framework for model-based offline RL.
We show that MOReL matches or exceeds state-of-the-art results in widely studied offline RL benchmarks.
arXiv Detail & Related papers (2020-05-12T17:52:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.