A Coupled Flow Approach to Imitation Learning
- URL: http://arxiv.org/abs/2305.00303v1
- Date: Sat, 29 Apr 2023 17:10:17 GMT
- Title: A Coupled Flow Approach to Imitation Learning
- Authors: Gideon Freund, Elad Sarafian, Sarit Kraus
- Abstract summary: In reinforcement learning and imitation learning, an object of central importance is the state distribution induced by the policy.
In this work, we investigate applications of a normalizing flow-based model for the aforementioned distributions.
Our algorithm, Coupled Flow Learning (CFIL), achieves state-of-the-art performance on benchmark tasks with a single expert trajectory.
- Score: 24.024918837659474
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In reinforcement learning and imitation learning, an object of central
importance is the state distribution induced by the policy. It plays a crucial
role in the policy gradient theorem, and references to it--along with the
related state-action distribution--can be found all across the literature.
Despite its importance, the state distribution is mostly discussed indirectly
and theoretically, rather than being modeled explicitly. The reason being an
absence of appropriate density estimation tools. In this work, we investigate
applications of a normalizing flow-based model for the aforementioned
distributions. In particular, we use a pair of flows coupled through the
optimality point of the Donsker-Varadhan representation of the Kullback-Leibler
(KL) divergence, for distribution matching based imitation learning. Our
algorithm, Coupled Flow Imitation Learning (CFIL), achieves state-of-the-art
performance on benchmark tasks with a single expert trajectory and extends
naturally to a variety of other settings, including the subsampled and
state-only regimes.
Related papers
- Theory on Score-Mismatched Diffusion Models and Zero-Shot Conditional Samplers [49.97755400231656]
We present the first performance guarantee with explicit dimensional general score-mismatched diffusion samplers.
We show that score mismatches result in an distributional bias between the target and sampling distributions, proportional to the accumulated mismatch between the target and training distributions.
This result can be directly applied to zero-shot conditional samplers for any conditional model, irrespective of measurement noise.
arXiv Detail & Related papers (2024-10-17T16:42:12Z) - Generative Assignment Flows for Representing and Learning Joint Distributions of Discrete Data [2.6499018693213316]
We introduce a novel generative model for the representation of joint probability distributions of a possibly large number of discrete random variables.
The embedding of the flow via the Segre map in the meta-simplex of all discrete joint distributions ensures that any target distribution can be represented in principle.
Our approach has strong motivation from first principles of modeling coupled discrete variables.
arXiv Detail & Related papers (2024-06-06T21:58:33Z) - Harnessing the Power of Vicinity-Informed Analysis for Classification under Covariate Shift [9.530897053573186]
Transfer learning enhances prediction accuracy on a target distribution by leveraging data from a source distribution.
This paper introduces a novel dissimilarity measure that utilizes vicinity information, i.e., the local structure of data points.
We characterize the excess error using the proposed measure and demonstrate faster or competitive convergence rates compared to previous techniques.
arXiv Detail & Related papers (2024-05-27T07:55:27Z) - Theoretical Insights for Diffusion Guidance: A Case Study for Gaussian
Mixture Models [59.331993845831946]
Diffusion models benefit from instillation of task-specific information into the score function to steer the sample generation towards desired properties.
This paper provides the first theoretical study towards understanding the influence of guidance on diffusion models in the context of Gaussian mixture models.
arXiv Detail & Related papers (2024-03-03T23:15:48Z) - A Distributional Analogue to the Successor Representation [54.99439648059807]
This paper contributes a new approach for distributional reinforcement learning.
It elucidates a clean separation of transition structure and reward in the learning process.
As an illustration, we show that it enables zero-shot risk-sensitive policy evaluation.
arXiv Detail & Related papers (2024-02-13T15:35:24Z) - Deep conditional distribution learning via conditional Föllmer flow [3.227277661633986]
We introduce an ordinary differential equation (ODE) based deep generative method for learning conditional distributions, named Conditional F"ollmer Flow.
For effective implementation, we discretize the flow with Euler's method where we estimate the velocity field nonparametrically using a deep neural network.
arXiv Detail & Related papers (2024-02-02T14:52:10Z) - Learning a Diffusion Model Policy from Rewards via Q-Score Matching [93.0191910132874]
We present a theoretical framework linking the structure of diffusion model policies to a learned Q-function.
We propose a new policy update method from this theory, which we denote Q-score matching.
arXiv Detail & Related papers (2023-12-18T23:31:01Z) - Enhancing Robustness of Foundation Model Representations under
Provenance-related Distribution Shifts [8.298173603769063]
We examine the stability of models based on foundation models under distribution shift.
We focus on confounding by provenance, a form of distribution shift that emerges in the context of multi-institutional datasets.
Results indicate that while foundation models do show some out-of-the-box robustness to confounding-by-provenance related distribution shifts, this can be improved through adjustment.
arXiv Detail & Related papers (2023-12-09T02:02:45Z) - Variance-Preserving-Based Interpolation Diffusion Models for Speech
Enhancement [53.2171981279647]
We present a framework that encapsulates both the VP- and variance-exploding (VE)-based diffusion methods.
To improve performance and ease model training, we analyze the common difficulties encountered in diffusion models.
We evaluate our model against several methods using a public benchmark to showcase the effectiveness of our approach.
arXiv Detail & Related papers (2023-06-14T14:22:22Z) - Diffusion Policies as an Expressive Policy Class for Offline
Reinforcement Learning [70.20191211010847]
Offline reinforcement learning (RL) aims to learn an optimal policy using a previously collected static dataset.
We introduce Diffusion Q-learning (Diffusion-QL) that utilizes a conditional diffusion model to represent the policy.
We show that our method can achieve state-of-the-art performance on the majority of the D4RL benchmark tasks.
arXiv Detail & Related papers (2022-08-12T09:54:11Z) - Decentralized Event-Triggered Federated Learning with Heterogeneous
Communication Thresholds [12.513477328344255]
We propose a novel methodology for distributed model aggregations via asynchronous, event-triggered consensus iterations over a network graph topology.
We demonstrate that our methodology achieves the globally optimal learning model under standard assumptions in distributed learning and graph consensus literature.
arXiv Detail & Related papers (2022-04-07T20:35:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.