Offline Imitation Learning upon Arbitrary Demonstrations by Pre-Training Dynamics Representations
- URL: http://arxiv.org/abs/2508.14383v1
- Date: Wed, 20 Aug 2025 03:23:20 GMT
- Title: Offline Imitation Learning upon Arbitrary Demonstrations by Pre-Training Dynamics Representations
- Authors: Haitong Ma, Bo Dai, Zhaolin Ren, Yebin Wang, Na Li,
- Abstract summary: We introduce a pre-training stage that learns dynamics representations, derived from factorizations of the transition dynamics.<n>We show that our proposed algorithm can mimic expert policies with as few as a single trajectory.
- Score: 16.363455701286696
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Limited data has become a major bottleneck in scaling up offline imitation learning (IL). In this paper, we propose enhancing IL performance under limited expert data by introducing a pre-training stage that learns dynamics representations, derived from factorizations of the transition dynamics. We first theoretically justify that the optimal decision variable of offline IL lies in the representation space, significantly reducing the parameters to learn in the downstream IL. Moreover, the dynamics representations can be learned from arbitrary data collected with the same dynamics, allowing the reuse of massive non-expert data and mitigating the limited data issues. We present a tractable loss function inspired by noise contrastive estimation to learn the dynamics representations at the pre-training stage. Experiments on MuJoCo demonstrate that our proposed algorithm can mimic expert policies with as few as a single trajectory. Experiments on real quadrupeds show that we can leverage pre-trained dynamics representations from simulator data to learn to walk from a few real-world demonstrations.
Related papers
- FANoise: Singular Value-Adaptive Noise Modulation for Robust Multimodal Representation Learning [24.94576263410761]
We study the role of noise gradient in representation learning from both-based and feature distribution perspectives.<n>We propose FANoise, a novel feature-adaptive noise injection strategy.<n>Under this framework, experiments demonstrate that FANoise consistently improves overall performance on multimodal tasks.
arXiv Detail & Related papers (2025-11-26T02:50:29Z) - Latent Diffusion Planning for Imitation Learning [78.56207566743154]
Latent Diffusion Planning (LDP) is a modular approach consisting of a planner and inverse dynamics model.<n>By separating planning from action prediction, LDP can benefit from the denser supervision signals of suboptimal and action-free data.<n>On simulated visual robotic manipulation tasks, LDP outperforms state-of-the-art imitation learning approaches.
arXiv Detail & Related papers (2025-04-23T17:53:34Z) - DynaMo: In-Domain Dynamics Pretraining for Visuo-Motor Control [18.737628473949048]
Imitation learning has proven to be a powerful tool for training complex visuomotor policies.
Current methods often require hundreds to thousands of expert demonstrations to handle high-dimensional visual observations.
We present DynaMo, a new in-domain, self-supervised method for learning visual representations.
arXiv Detail & Related papers (2024-09-18T17:59:43Z) - Inverse Dynamics Pretraining Learns Good Representations for Multitask
Imitation [66.86987509942607]
We evaluate how such a paradigm should be done in imitation learning.
We consider a setting where the pretraining corpus consists of multitask demonstrations.
We argue that inverse dynamics modeling is well-suited to this setting.
arXiv Detail & Related papers (2023-05-26T14:40:46Z) - Accelerating exploration and representation learning with offline
pre-training [52.6912479800592]
We show that exploration and representation learning can be improved by separately learning two different models from a single offline dataset.
We show that learning a state representation using noise-contrastive estimation and a model of auxiliary reward can significantly improve the sample efficiency on the challenging NetHack benchmark.
arXiv Detail & Related papers (2023-03-31T18:03:30Z) - Leveraging Neural Koopman Operators to Learn Continuous Representations
of Dynamical Systems from Scarce Data [0.0]
We propose a new deep Koopman framework that represents dynamics in an intrinsically continuous way.
This framework leads to better performance on limited training data.
arXiv Detail & Related papers (2023-03-13T10:16:19Z) - TRAIL: Near-Optimal Imitation Learning with Suboptimal Data [100.83688818427915]
We present training objectives that use offline datasets to learn a factored transition model.
Our theoretical analysis shows that the learned latent action space can boost the sample-efficiency of downstream imitation learning.
To learn the latent action space in practice, we propose TRAIL (Transition-Reparametrized Actions for Imitation Learning), an algorithm that learns an energy-based transition model.
arXiv Detail & Related papers (2021-10-27T21:05:00Z) - DiffSRL: Learning Dynamic-aware State Representation for Deformable
Object Control with Differentiable Simulator [26.280021036447213]
Latent space that can capture dynamics related information has wide application in areas such as accelerating model free reinforcement learning.
We propose DiffSRL, a dynamic state representation learning pipeline utilizing differentiable simulation.
Our model demonstrates superior performance in terms of capturing long-term dynamics as well as reward prediction.
arXiv Detail & Related papers (2021-10-24T04:53:58Z) - PlayVirtual: Augmenting Cycle-Consistent Virtual Trajectories for
Reinforcement Learning [84.30765628008207]
We propose a novel method, dubbed PlayVirtual, which augments cycle-consistent virtual trajectories to enhance the data efficiency for RL feature representation learning.
Our method outperforms the current state-of-the-art methods by a large margin on both benchmarks.
arXiv Detail & Related papers (2021-06-08T07:37:37Z) - Understanding Learning Dynamics for Neural Machine Translation [53.23463279153577]
We propose to understand learning dynamics of NMT by using Loss Change Allocation (LCA)citeplan 2019-loss-change-allocation.
As LCA requires calculating the gradient on an entire dataset for each update, we instead present an approximate to put it into practice in NMT scenario.
Our simulated experiment shows that such approximate calculation is efficient and is empirically proved to deliver consistent results.
arXiv Detail & Related papers (2020-04-05T13:32:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.