Related papers: Imitation from Observations with Trajectory-Level Generative Embeddings

Imitation from Observations with Trajectory-Level Generative Embeddings

URL: http://arxiv.org/abs/2601.00452v1
Date: Thu, 01 Jan 2026 19:38:37 GMT
Title: Imitation from Observations with Trajectory-Level Generative Embeddings
Authors: Yongtao Qu, Shangzhe Li, Weitong Zhang,
Abstract summary: We consider the offline imitation learning from observations (LfO) where the expert demonstrations are scarce and the available offline suboptimal data are far from the expert behavior.<n>We propose TGE, a trajectory-level generative embedding for offline LfO that constructs a dense, smooth surrogate reward by estimating expert state density in the latent space of a temporal diffusion model trained on offline trajectory data.
Score: 18.63253047959276
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider the offline imitation learning from observations (LfO) where the expert demonstrations are scarce and the available offline suboptimal data are far from the expert behavior. Many existing distribution-matching approaches struggle in this regime because they impose strict support constraints and rely on brittle one-step models, making it hard to extract useful signal from imperfect data. To tackle this challenge, we propose TGE, a trajectory-level generative embedding for offline LfO that constructs a dense, smooth surrogate reward by estimating expert state density in the latent space of a temporal diffusion model trained on offline trajectory data. By leveraging the smooth geometry of the learned diffusion embedding, TGE captures long-horizon temporal dynamics and effectively bridges the gap between disjoint supports, ensuring a robust learning signal even when offline data is distributionally distinct from the expert. Empirically, the proposed approach consistently matches or outperforms prior offline LfO methods across a range of D4RL locomotion and manipulation benchmarks.

Related papers

The Offline-Frontier Shift: Diagnosing Distributional Limits in Generative Multi-Objective Optimization [56.39938641873341]
We show that generative methods systematically underperform evolutionary alternatives with respect to other metrics, such as generational distance.<n>We argue that overcoming this limitation requires out-of-distribution sampling in objective space.<n>Our results position offline MOO as a distribution-shift--limited problem and provide a diagnostic lens for understanding when and why generative optimization methods fail.
arXiv Detail & Related papers (2026-02-11T18:38:40Z)
VMF-GOS: Geometry-guided virtual Outlier Synthesis for Long-Tailed OOD Detection [10.895746797423223]
We introduce a Geometry-guided virtual Outlier Synthesis (GOS) strategy that models statistical properties using the von Mises-Fisher (vMF) distribution on a hypersphere.<n>Specifically, we locate a low-likelihood annulus in the feature space and perform directional sampling of virtual outliers in this region.<n>Experiments on benchmarks such as CIFAR-LT demonstrate that our method outperforms sota approaches that utilize external real images.
arXiv Detail & Related papers (2026-02-05T07:58:12Z)
Parallel Test-Time Scaling for Latent Reasoning Models [58.428340345068214]
Parallel test-time scaling (TTS) is a pivotal approach for enhancing large language models (LLMs)<n>Recent advances in latent reasoning, where intermediate reasoning unfolds in continuous vector spaces, offer a more efficient alternative to explicit Chain-of-Thought.<n>This work enables parallel TTS for latent reasoning models by addressing the above issues.
arXiv Detail & Related papers (2025-10-09T03:33:00Z)
TAG:Tangential Amplifying Guidance for Hallucination-Resistant Diffusion Sampling [53.61290359948953]
Tangential Amplifying Guidance (TAG) operates solely on trajectory signals without modifying the underlying diffusion model.<n>We formalize this guidance process by leveraging a first-order Taylor expansion.<n> TAG is a plug-and-play, architecture-agnostic module that improves diffusion sampling fidelity with minimal computational addition.
arXiv Detail & Related papers (2025-10-06T06:53:29Z)
Just-in-time Episodic Feedback Hinter: Leveraging Offline Knowledge to Improve LLM Agents Adaptation [77.90555621662345]
We present JEF Hinter, an agentic system that distills offline traces into compact, context-aware hints.<n>A zooming mechanism highlights decisive steps in long trajectories, capturing both strategies and pitfalls.<n>Experiments on MiniWoB++, WorkArena-L1, and WebArena-Lite show that JEF Hinter consistently outperforms strong baselines.
arXiv Detail & Related papers (2025-10-05T21:34:42Z)
Unsupervised Online 3D Instance Segmentation with Synthetic Sequences and Dynamic Loss [52.28880405119483]
Unsupervised online 3D instance segmentation is a fundamental yet challenging task.<n>Existing methods, such as UNIT, have made progress in this direction but remain constrained by limited training diversity.<n>We propose a new framework that enriches the training distribution through synthetic point cloud sequence generation.
arXiv Detail & Related papers (2025-09-27T08:53:27Z)
Structural Information-based Hierarchical Diffusion for Offline Reinforcement Learning [13.839214658191038]
We propose Structural Information-based Hierarchical Diffusion framework for effective and stable offline policy learning.<n>We analyze structural information embedded in offline trajectories to construct the diffusion hierarchy adaptively.<n>We show that SIHD significantly outperforms state-of-the-art baselines in decision-making performance.
arXiv Detail & Related papers (2025-09-26T06:24:06Z)
RAD: Retrieval High-quality Demonstrations to Enhance Decision-making [23.136426643341462]
offline reinforcement learning (RL) enables agents to learn policies from fixed datasets.<n>RL is often limited by dataset sparsity and the lack of transition overlap between suboptimal and expert trajectories.<n>We propose Retrieval High-quAlity Demonstrations (RAD) for decision-making, which combines non-parametric retrieval with diffusion-based generative modeling.
arXiv Detail & Related papers (2025-07-21T08:08:18Z)
Offline Reinforcement Learning with Wasserstein Regularization via Optimal Transport Maps [47.57615889991631]
offline reinforcement learning (RL) aims to learn an optimal policy from a static dataset.<n>We propose an approach that utilizes the Wasserstein distance, which is robust to out-of-distribution data.<n>Our approach demonstrates comparable or superior performance to widely used methods on the D4RL benchmark dataset.
arXiv Detail & Related papers (2025-07-14T22:28:36Z)
Local Manifold Approximation and Projection for Manifold-Aware Diffusion Planning [23.945423041112036]
Local Manifold Approximation and Projection (LoMAP) is a training-free method that projects the guided sample onto a low-rank subspace approximated from offline datasets.<n>We show that LoMAP can be incorporated into the hierarchical diffusion planner, providing further performance enhancements.
arXiv Detail & Related papers (2025-06-01T07:16:39Z)
Embedding Trajectory for Out-of-Distribution Detection in Mathematical Reasoning [50.84938730450622]
We propose a trajectory-based method TV score, which uses trajectory volatility for OOD detection in mathematical reasoning. Our method outperforms all traditional algorithms on GLMs under mathematical reasoning scenarios. Our method can be extended to more applications with high-density features in output spaces, such as multiple-choice questions.
arXiv Detail & Related papers (2024-05-22T22:22:25Z)
Reasoning with Latent Diffusion in Offline Reinforcement Learning [11.349356866928547]
offline reinforcement learning holds promise as a means to learn high-reward policies from a static dataset. Key challenge in offline RL lies in effectively stitching portions of suboptimal trajectories from the static dataset. We propose a novel approach that leverages the expressiveness of latent diffusion to model in-support trajectory sequences as compressed latent skills.
arXiv Detail & Related papers (2023-09-12T20:58:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.