Related papers: Flow Q-Learning

Flow Q-Learning

URL: http://arxiv.org/abs/2502.02538v1
Date: Tue, 04 Feb 2025 18:04:05 GMT
Title: Flow Q-Learning
Authors: Seohong Park, Qiyang Li, Sergey Levine,
Abstract summary: We present flow Q-learning (FQL), a simple and performant offline reinforcement learning (RL) method.<n>FQL trains an expressive one-step policy with RL, rather than directly guiding an iterative flow policy to maximize values.<n>We experimentally show that FQL leads to strong performance across 73 challenging state- and pixel-based OGBench and D4RL tasks.
Score: 61.60383927357656
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present flow Q-learning (FQL), a simple and performant offline reinforcement learning (RL) method that leverages an expressive flow-matching policy to model arbitrarily complex action distributions in data. Training a flow policy with RL is a tricky problem, due to the iterative nature of the action generation process. We address this challenge by training an expressive one-step policy with RL, rather than directly guiding an iterative flow policy to maximize values. This way, we can completely avoid unstable recursive backpropagation, eliminate costly iterative action generation at test time, yet still mostly maintain expressivity. We experimentally show that FQL leads to strong performance across 73 challenging state- and pixel-based OGBench and D4RL tasks in offline RL and offline-to-online RL. Project page: https://seohong.me/projects/fql/

Related papers

Scaling Offline RL via Efficient and Expressive Shortcut Models [13.050231036248338]
offline reinforcement learning (RL) remains challenging due to the iterative nature of their noise sampling processes.<n>We introduce Scalable Offline Reinforcement Learning (SORL), a new offline RL algorithm that leverages shortcut models to scale both training and inference.<n>We demonstrate that SORL achieves strong performance across a range of offline RL tasks and exhibits positive scaling behavior with increased test-time compute.
arXiv Detail & Related papers (2025-05-28T20:59:22Z)
Action-Quantized Offline Reinforcement Learning for Robotic Skill Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data. In this paper, we propose an adaptive scheme for action quantization. We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z)
Efficient Diffusion Policies for Offline Reinforcement Learning [85.73757789282212]
Diffsuion-QL significantly boosts the performance of offline RL by representing a policy with a diffusion model. We propose efficient diffusion policy (EDP) to overcome these two challenges. EDP constructs actions from corrupted ones at training to avoid running the sampling chain.
arXiv Detail & Related papers (2023-05-31T17:55:21Z)
IDQL: Implicit Q-Learning as an Actor-Critic Method with Diffusion Policies [72.4573167739712]
Implicit Q-learning (IQL) trains a Q-function using only dataset actions through a modified Bellman backup. It is unclear which policy actually attains the values represented by this trained Q-function. We introduce Implicit Q-learning (IDQL), combining our general IQL critic with the policy extraction method.
arXiv Detail & Related papers (2023-04-20T18:04:09Z)
Boosting Offline Reinforcement Learning via Data Rebalancing [104.3767045977716]
offline reinforcement learning (RL) is challenged by the distributional shift between learning policies and datasets. We propose a simple yet effective method to boost offline RL algorithms based on the observation that resampling a dataset keeps the distribution support unchanged. We dub our method ReD (Return-based Data Rebalance), which can be implemented with less than 10 lines of code change and adds negligible running time.
arXiv Detail & Related papers (2022-10-17T16:34:01Z)
Q-learning Decision Transformer: Leveraging Dynamic Programming for Conditional Sequence Modelling in Offline RL [0.0]
Decision Transformer (DT) combines the conditional policy approach and a transformer architecture. DT lacks stitching ability -- one of the critical abilities for offline RL to learn the optimal policy. We propose the Q-learning Decision Transformer (QDT) to address the shortcomings of DT.
arXiv Detail & Related papers (2022-09-08T18:26:39Z)
Boosting Offline Reinforcement Learning with Residual Generative Modeling [27.50950972741753]
offline reinforcement learning (RL) tries to learn the near-optimal policy with recorded offline experience without online exploration. We show that our method can learn more accurate policy approximations in different benchmark datasets. In addition, we show that the proposed offline RL method can learn more competitive AI agents in complex control tasks under the multiplayer online battle arena (MOBA) game Honor of Kings.
arXiv Detail & Related papers (2021-06-19T03:41:14Z)
Conservative Q-Learning for Offline Reinforcement Learning [106.05582605650932]
We show that CQL substantially outperforms existing offline RL methods, often learning policies that attain 2-5 times higher final return. We theoretically show that CQL produces a lower bound on the value of the current policy and that it can be incorporated into a policy learning procedure with theoretical improvement guarantees.
arXiv Detail & Related papers (2020-06-08T17:53:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.