Fighting Uncertainty with Gradients: Offline Reinforcement Learning via
Diffusion Score Matching
- URL: http://arxiv.org/abs/2306.14079v2
- Date: Tue, 17 Oct 2023 03:17:56 GMT
- Title: Fighting Uncertainty with Gradients: Offline Reinforcement Learning via
Diffusion Score Matching
- Authors: H.J. Terry Suh, Glen Chou, Hongkai Dai, Lujie Yang, Abhishek Gupta,
Russ Tedrake
- Abstract summary: We study smoothed distance to data as an uncertainty metric, and claim that it has two beneficial properties.
We show these gradients can be efficiently learned with score-matching techniques.
We propose Score-Guided Planning (SGP) to enable first-order planning in high-dimensional problems.
- Score: 22.461036967440723
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Gradient-based methods enable efficient search capabilities in high
dimensions. However, in order to apply them effectively in offline optimization
paradigms such as offline Reinforcement Learning (RL) or Imitation Learning
(IL), we require a more careful consideration of how uncertainty estimation
interplays with first-order methods that attempt to minimize them. We study
smoothed distance to data as an uncertainty metric, and claim that it has two
beneficial properties: (i) it allows gradient-based methods that attempt to
minimize uncertainty to drive iterates to data as smoothing is annealed, and
(ii) it facilitates analysis of model bias with Lipschitz constants. As
distance to data can be expensive to compute online, we consider settings where
we need amortize this computation. Instead of learning the distance however, we
propose to learn its gradients directly as an oracle for first-order
optimizers. We show these gradients can be efficiently learned with
score-matching techniques by leveraging the equivalence between distance to
data and data likelihood. Using this insight, we propose Score-Guided Planning
(SGP), a planning algorithm for offline RL that utilizes score-matching to
enable first-order planning in high-dimensional problems, where zeroth-order
methods were unable to scale, and ensembles were unable to overcome local
minima. Website: https://sites.google.com/view/score-guided-planning/home
Related papers
- FLOPS: Forward Learning with OPtimal Sampling [1.694989793927645]
gradient-based computation methods have recently gained focus for learning with only forward passes, also referred to as queries.
Conventional forward learning consumes enormous queries on each data point for accurate gradient estimation through Monte Carlo sampling.
We propose to allocate the optimal number of queries over each data in one batch during training to achieve a good balance between estimation accuracy and computational efficiency.
arXiv Detail & Related papers (2024-10-08T12:16:12Z) - Linearized Wasserstein dimensionality reduction with approximation
guarantees [65.16758672591365]
LOT Wassmap is a computationally feasible algorithm to uncover low-dimensional structures in the Wasserstein space.
We show that LOT Wassmap attains correct embeddings and that the quality improves with increased sample size.
We also show how LOT Wassmap significantly reduces the computational cost when compared to algorithms that depend on pairwise distance computations.
arXiv Detail & Related papers (2023-02-14T22:12:16Z) - sqSGD: Locally Private and Communication Efficient Federated Learning [14.60645909629309]
Federated learning (FL) is a technique that trains machine learning models from decentralized data sources.
We develop a gradient-based learning algorithm called sqSGD that addresses communication efficiency and high-dimensional compatibility.
Experiment results show sqSGD successfully learns large models like LeNet and ResNet with local privacy constraints.
arXiv Detail & Related papers (2022-06-21T17:45:35Z) - Bi-level Alignment for Cross-Domain Crowd Counting [113.78303285148041]
Current methods rely on external data for training an auxiliary task or apply an expensive coarse-to-fine estimation.
We develop a new adversarial learning based method, which is simple and efficient to apply.
We evaluate our approach on five real-world crowd counting benchmarks, where we outperform existing approaches by a large margin.
arXiv Detail & Related papers (2022-05-12T02:23:25Z) - Implicit Parameter-free Online Learning with Truncated Linear Models [51.71216912089413]
parameter-free algorithms are online learning algorithms that do not require setting learning rates.
We propose new parameter-free algorithms that can take advantage of truncated linear models through a new update that has an "implicit" flavor.
Based on a novel decomposition of the regret, the new update is efficient, requires only one gradient at each step, never overshoots the minimum of the truncated model, and retains the favorable parameter-free properties.
arXiv Detail & Related papers (2022-03-19T13:39:49Z) - Simple Stochastic and Online Gradient DescentAlgorithms for Pairwise
Learning [65.54757265434465]
Pairwise learning refers to learning tasks where the loss function depends on a pair instances.
Online descent (OGD) is a popular approach to handle streaming data in pairwise learning.
In this paper, we propose simple and online descent to methods for pairwise learning.
arXiv Detail & Related papers (2021-11-23T18:10:48Z) - Regret minimization in stochastic non-convex learning via a
proximal-gradient approach [80.59047515124198]
Motivated by applications in machine learning and operations, we regret with first-order oracle feedback minimization online constrained problems.
We develop a new prox-grad with guarantees proximal complexity reduction techniques.
arXiv Detail & Related papers (2020-10-13T09:22:21Z) - Low-Rank Robust Online Distance/Similarity Learning based on the
Rescaled Hinge Loss [0.34376560669160383]
Existing online methods usually assume training triplets or pairwise constraints are exist in advance.
We formulate the online Distance-Similarity learning problem with the robust Rescaled hinge loss function.
The proposed model is rather general and can be applied to any PA-based online Distance-Similarity algorithm.
arXiv Detail & Related papers (2020-10-07T08:38:34Z) - FedPD: A Federated Learning Framework with Optimal Rates and Adaptivity
to Non-IID Data [59.50904660420082]
Federated Learning (FL) has become a popular paradigm for learning from distributed data.
To effectively utilize data at different devices without moving them to the cloud, algorithms such as the Federated Averaging (FedAvg) have adopted a "computation then aggregation" (CTA) model.
arXiv Detail & Related papers (2020-05-22T23:07:42Z) - Resolving learning rates adaptively by locating Stochastic Non-Negative
Associated Gradient Projection Points using line searches [0.0]
Learning rates in neural network training are currently determined a priori to training using expensive manual or automated tuning.
This study proposes gradient-only line searches to resolve the learning rate for neural network training algorithms.
arXiv Detail & Related papers (2020-01-15T03:08:07Z) - Adaptive Gradient Sparsification for Efficient Federated Learning: An
Online Learning Approach [11.986523531539165]
Federated learning (FL) is an emerging technique for training machine learning models using geographically dispersed data.
gradient sparsification (GS) can be applied, where instead of the full gradient, only a small subset of important elements of the gradient is communicated.
We propose a novel online learning formulation and algorithm for automatically determining the near-optimal communication and trade-off.
arXiv Detail & Related papers (2020-01-14T13:09:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.