Offline Imitation from Observation via Primal Wasserstein State Occupancy Matching
- URL: http://arxiv.org/abs/2311.01331v3
- Date: Sun, 9 Jun 2024 18:43:27 GMT
- Title: Offline Imitation from Observation via Primal Wasserstein State Occupancy Matching
- Authors: Kai Yan, Alexander G. Schwing, Yu-xiong Wang,
- Abstract summary: We propose Primal Wasserstein DICE to minimize the primal Wasserstein distance between the learner and expert state occupancies.
Our framework is a generalization of SMODICE, and is the first work that unifies $f$-divergence and Wasserstein minimization.
- Score: 111.78179839856293
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: In real-world scenarios, arbitrary interactions with the environment can often be costly, and actions of expert demonstrations are not always available. To reduce the need for both, offline Learning from Observations (LfO) is extensively studied: the agent learns to solve a task given only expert states and task-agnostic non-expert state-action pairs. The state-of-the-art DIstribution Correction Estimation (DICE) methods, as exemplified by SMODICE, minimize the state occupancy divergence between the learner's and empirical expert policies. However, such methods are limited to either $f$-divergences (KL and $chi^2$) or Wasserstein distance with Rubinstein duality, the latter of which constrains the underlying distance metric crucial to the performance of Wasserstein-based solutions. To enable more flexible distance metrics, we propose Primal Wasserstein DICE (PW-DICE). It minimizes the primal Wasserstein distance between the learner and expert state occupancies and leverages a contrastively learned distance metric. Theoretically, our framework is a generalization of SMODICE, and is the first work that unifies $f$-divergence and Wasserstein minimization. Empirically, we find that PW-DICE improves upon several state-of-the-art methods. The code is available at https://github.com/KaiYan289/PW-DICE.
Related papers
- Estimating Barycenters of Distributions with Neural Optimal Transport [93.28746685008093]
We propose a new scalable approach for solving the Wasserstein barycenter problem.
Our methodology is based on the recent Neural OT solver.
We also establish theoretical error bounds for our proposed approach.
arXiv Detail & Related papers (2024-02-06T09:17:07Z) - A Simple Solution for Offline Imitation from Observations and Examples
with Possibly Incomplete Trajectories [122.11358440078581]
offline imitation is useful in real-world scenarios where arbitrary interactions are costly and expert actions are unavailable.
We propose Trajectory-Aware Learning from Observations (TAILO) to solve MDPs where only task-specific expert states and task-agnostic non-expert state-action pairs are available.
arXiv Detail & Related papers (2023-11-02T15:41:09Z) - Wasserstein Adversarial Examples on Univariant Time Series Data [23.15675721397447]
We propose adversarial examples in the Wasserstein space for time series data.
We use Wasserstein distance to bound the perturbation between normal examples and adversarial examples.
We empirically evaluate the proposed attack on several time series datasets in the healthcare domain.
arXiv Detail & Related papers (2023-03-22T07:50:15Z) - Robust Estimation under the Wasserstein Distance [27.382258439576606]
Given $n$ samples, of which $varepsilon n$ are adversarially corrupted, we seek an estimate for $mu$ with minimal Wasserstein error.
We prove new structural properties for POT and use them to show that MDE under a partial Wasserstein distance achieves the minimax-optimal robust estimation risk.
Since the popular Wasserstein generative adversarial network (WGAN) framework implements Wasserstein MDE via Kantorovich duality, our penalized dual enables large-scale generative modeling with contaminated datasets.
arXiv Detail & Related papers (2023-02-02T17:20:25Z) - Approximative Algorithms for Multi-Marginal Optimal Transport and
Free-Support Wasserstein Barycenters [0.0]
We present two algorithms to approximate the solution of multi-marginal optimal transport (MOT) with squared Euclidean costs for $N$ discrete probability measures.
They are fast, memory-efficient and easy to implement and can be used with any sparse OT solver as a black box.
arXiv Detail & Related papers (2022-02-02T10:59:54Z) - Online Apprenticeship Learning [58.45089581278177]
In Apprenticeship Learning (AL), we are given a Markov Decision Process (MDP) without access to the cost function.
The goal is to find a policy that matches the expert's performance on some predefined set of cost functions.
We show that the OAL problem can be effectively solved by combining two mirror descent based no-regret algorithms.
arXiv Detail & Related papers (2021-02-13T12:57:51Z) - On Projection Robust Optimal Transport: Sample Complexity and Model
Misspecification [101.0377583883137]
Projection robust (PR) OT seeks to maximize the OT cost between two measures by choosing a $k$-dimensional subspace onto which they can be projected.
Our first contribution is to establish several fundamental statistical properties of PR Wasserstein distances.
Next, we propose the integral PR Wasserstein (IPRW) distance as an alternative to the PRW distance, by averaging rather than optimizing on subspaces.
arXiv Detail & Related papers (2020-06-22T14:35:33Z) - Primal Wasserstein Imitation Learning [44.87651595571687]
We propose a new Imitation Learning (IL) method based on a conceptually simple algorithm: Primal Wasserstein Imitation Learning (PWIL)
We show that we can recover expert behavior on a variety of continuous control tasks of the MuJoCo domain in a sample efficient manner in terms of agent interactions and of expert interactions with the environment.
arXiv Detail & Related papers (2020-06-08T15:30:11Z) - Fast and Robust Comparison of Probability Measures in Heterogeneous
Spaces [62.35667646858558]
We introduce the Anchor Energy (AE) and Anchor Wasserstein (AW) distances, which are respectively the energy and Wasserstein distances instantiated on such representations.
Our main contribution is to propose a sweep line algorithm to compute AE emphexactly in log-quadratic time, where a naive implementation would be cubic.
We show that AE and AW perform well in various experimental settings at a fraction of the computational cost of popular GW approximations.
arXiv Detail & Related papers (2020-02-05T03:09:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.