Fighting Uncertainty with Gradients: Offline Reinforcement Learning via
Diffusion Score Matching
- URL: http://arxiv.org/abs/2306.14079v2
- Date: Tue, 17 Oct 2023 03:17:56 GMT
- Title: Fighting Uncertainty with Gradients: Offline Reinforcement Learning via
Diffusion Score Matching
- Authors: H.J. Terry Suh, Glen Chou, Hongkai Dai, Lujie Yang, Abhishek Gupta,
Russ Tedrake
- Abstract summary: We study smoothed distance to data as an uncertainty metric, and claim that it has two beneficial properties.
We show these gradients can be efficiently learned with score-matching techniques.
We propose Score-Guided Planning (SGP) to enable first-order planning in high-dimensional problems.
- Score: 22.461036967440723
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Gradient-based methods enable efficient search capabilities in high
dimensions. However, in order to apply them effectively in offline optimization
paradigms such as offline Reinforcement Learning (RL) or Imitation Learning
(IL), we require a more careful consideration of how uncertainty estimation
interplays with first-order methods that attempt to minimize them. We study
smoothed distance to data as an uncertainty metric, and claim that it has two
beneficial properties: (i) it allows gradient-based methods that attempt to
minimize uncertainty to drive iterates to data as smoothing is annealed, and
(ii) it facilitates analysis of model bias with Lipschitz constants. As
distance to data can be expensive to compute online, we consider settings where
we need amortize this computation. Instead of learning the distance however, we
propose to learn its gradients directly as an oracle for first-order
optimizers. We show these gradients can be efficiently learned with
score-matching techniques by leveraging the equivalence between distance to
data and data likelihood. Using this insight, we propose Score-Guided Planning
(SGP), a planning algorithm for offline RL that utilizes score-matching to
enable first-order planning in high-dimensional problems, where zeroth-order
methods were unable to scale, and ensembles were unable to overcome local
minima. Website: https://sites.google.com/view/score-guided-planning/home
Related papers
Err
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.