Quantile Filtered Imitation Learning
- URL: http://arxiv.org/abs/2112.00950v1
- Date: Thu, 2 Dec 2021 03:08:23 GMT
- Title: Quantile Filtered Imitation Learning
- Authors: David Brandfonbrener, William F. Whitney, Rajesh Ranganath, Joan Bruna
- Abstract summary: quantile filtered imitation learning (QFIL) is a policy improvement operator designed for offline reinforcement learning.
We prove that QFIL gives us a safe policy improvement step with function approximation.
We see that QFIL performs well on the D4RL benchmark.
- Score: 49.11859771578969
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce quantile filtered imitation learning (QFIL), a novel policy
improvement operator designed for offline reinforcement learning. QFIL performs
policy improvement by running imitation learning on a filtered version of the
offline dataset. The filtering process removes $ s,a $ pairs whose estimated Q
values fall below a given quantile of the pushforward distribution over values
induced by sampling actions from the behavior policy. The definitions of both
the pushforward Q distribution and resulting value function quantile are key
contributions of our method. We prove that QFIL gives us a safe policy
improvement step with function approximation and that the choice of quantile
provides a natural hyperparameter to trade off bias and variance of the
improvement step. Empirically, we perform a synthetic experiment illustrating
how QFIL effectively makes a bias-variance tradeoff and we see that QFIL
performs well on the D4RL benchmark.
Related papers
- Projected Off-Policy Q-Learning (POP-QL) for Stabilizing Offline
Reinforcement Learning [57.83919813698673]
Projected Off-Policy Q-Learning (POP-QL) is a novel actor-critic algorithm that simultaneously reweights off-policy samples and constrains the policy to prevent divergence and reduce value-approximation error.
In our experiments, POP-QL not only shows competitive performance on standard benchmarks, but also out-performs competing methods in tasks where the data-collection policy is significantly sub-optimal.
arXiv Detail & Related papers (2023-11-25T00:30:58Z) - IDQL: Implicit Q-Learning as an Actor-Critic Method with Diffusion
Policies [72.4573167739712]
Implicit Q-learning (IQL) trains a Q-function using only dataset actions through a modified Bellman backup.
It is unclear which policy actually attains the values represented by this trained Q-function.
We introduce Implicit Q-learning (IDQL), combining our general IQL critic with the policy extraction method.
arXiv Detail & Related papers (2023-04-20T18:04:09Z) - Offline RL with No OOD Actions: In-Sample Learning via Implicit Value
Regularization [90.9780151608281]
In-sample learning (IQL) improves the policy by quantile regression using only data samples.
We make a key finding that the in-sample learning paradigm arises under the textitImplicit Value Regularization (IVR) framework.
We propose two practical algorithms, Sparse $Q$-learning (EQL) and Exponential $Q$-learning (EQL), which adopt the same value regularization used in existing works.
arXiv Detail & Related papers (2023-03-28T08:30:01Z) - Q-Pensieve: Boosting Sample Efficiency of Multi-Objective RL Through Memory Sharing of Q-Snapshots [11.533449955841968]
We propose Q-Pensieve, a policy improvement scheme that stores a collection of Q-snapshots to jointly determine the policy update direction.
We show that Q-Pensieve can be naturally integrated with soft policy iteration with convergence guarantee.
arXiv Detail & Related papers (2022-12-06T16:29:47Z) - Offline Reinforcement Learning with Implicit Q-Learning [85.62618088890787]
Current offline reinforcement learning methods need to query the value of unseen actions during training to improve the policy.
We propose an offline RL method that never needs to evaluate actions outside of the dataset.
This method enables the learned policy to improve substantially over the best behavior in the data through generalization.
arXiv Detail & Related papers (2021-10-12T17:05:05Z) - Sparse Feature Selection Makes Batch Reinforcement Learning More Sample
Efficient [62.24615324523435]
This paper provides a statistical analysis of high-dimensional batch Reinforcement Learning (RL) using sparse linear function approximation.
When there is a large number of candidate features, our result sheds light on the fact that sparsity-aware methods can make batch RL more sample efficient.
arXiv Detail & Related papers (2020-11-08T16:48:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.