Latent Action World Models for Control with Unlabeled Trajectories
- URL: http://arxiv.org/abs/2512.10016v1
- Date: Wed, 10 Dec 2025 19:09:45 GMT
- Title: Latent Action World Models for Control with Unlabeled Trajectories
- Authors: Marvin Alles, Xingyuan Zhang, Patrick van der Smagt, Philip Becker-Ehmck,
- Abstract summary: We study world models that learn from heterogeneous data.<n>We introduce a family of latent-action world models that jointly use action-conditioned and action-free data.
- Score: 8.965084673299858
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Inspired by how humans combine direct interaction with action-free experience (e.g., videos), we study world models that learn from heterogeneous data. Standard world models typically rely on action-conditioned trajectories, which limits effectiveness when action labels are scarce. We introduce a family of latent-action world models that jointly use action-conditioned and action-free data by learning a shared latent action representation. This latent space aligns observed control signals with actions inferred from passive observations, enabling a single dynamics model to train on large-scale unlabeled trajectories while requiring only a small set of action-labeled ones. We use the latent-action world model to learn a latent-action policy through offline reinforcement learning (RL), thereby bridging two traditionally separate domains: offline RL, which typically relies on action-conditioned data, and action-free training, which is rarely used with subsequent RL. On the DeepMind Control Suite, our approach achieves strong performance while using about an order of magnitude fewer action-labeled samples than purely action-conditioned baselines. These results show that latent actions enable training on both passive and interactive data, which makes world models learn more efficiently.
Related papers
- Olaf-World: Orienting Latent Actions for Video World Modeling [100.96069208914957]
Scaling action-controllable world models is limited by the scarcity of action labels.<n>We present Olaf-World, a pipeline that pretrains action-conditioned video world models from large-scale passive video.
arXiv Detail & Related papers (2026-02-10T18:58:41Z) - Coupled Local and Global World Models for Efficient First Order RL [10.305209288475817]
This paper introduces a method that bypasses simulators entirely, training RL policies inside world models learned from robots' interactions with real environments.<n>At its core, our approach enables policy training with large-scale diffusion models via a novel decoupled first-order gradient (FoG) method.<n>We demonstrate the efficacy of our method on the Push-T manipulation task, where it significantly outperforms PPO in sample efficiency.
arXiv Detail & Related papers (2026-02-05T21:57:41Z) - Latent Action Pretraining Through World Modeling [1.988007188564225]
We propose LAWM, a model-agnostic framework to pretrain imitation learning models in a self-supervised way.<n>Our framework is designed to be effective for transferring across tasks, environments, and embodiments.
arXiv Detail & Related papers (2025-09-22T21:19:10Z) - AXIOM: Learning to Play Games in Minutes with Expanding Object-Centric Models [41.429595107023125]
AXIOM is a novel architecture that integrates a minimal yet expressive set of core priors about object-centric dynamics and interactions.<n>It combines the usual data efficiency and interpretability of Bayesian approaches with the across-task generalization usually associated with DRL.<n> AXIOM masters various games within only 10,000 interaction steps, with both a small number of parameters compared to DRL, and without the computational expense of gradient-based optimization.
arXiv Detail & Related papers (2025-05-30T16:46:20Z) - CLAM: Continuous Latent Action Models for Robot Learning from Unlabeled Demonstrations [11.604546089466734]
Learning robot policies using imitation learning requires collecting large amounts of costly action-labeled expert demonstrations.<n>A promising approach is to harness the abundance of unlabeled observations-e.g., from video demonstrations-to learn latent action labels in an unsupervised way.<n>We design continuous latent action models (CLAM) which incorporate two key ingredients we find necessary for learning to solve complex continuous control tasks from unlabeled observation data.
arXiv Detail & Related papers (2025-05-08T07:07:58Z) - AdaWorld: Learning Adaptable World Models with Latent Actions [76.50869178593733]
We propose AdaWorld, an innovative world model learning approach that enables efficient adaptation.<n>Key idea is to incorporate action information during the pretraining of world models.<n>We then develop an autoregressive world model that conditions on these latent actions.
arXiv Detail & Related papers (2025-03-24T17:58:15Z) - Disentangled World Models: Learning to Transfer Semantic Knowledge from Distracting Videos for Reinforcement Learning [65.85335291827086]
This paper tries to learn and understand underlying semantic variations from distracting videos via offline-to-online latent distillation and flexible disentanglement constraints.<n>We pretrain the action-free video prediction model offline with disentanglement regularization to extract semantic knowledge from distracting videos.<n>For finetuning in the online environment, we exploit the knowledge from the pretrained model and introduce a disentanglement constraint to the world model.
arXiv Detail & Related papers (2025-03-11T13:50:22Z) - Model-Based Reinforcement Learning with Multi-Task Offline Pretraining [59.82457030180094]
We present a model-based RL method that learns to transfer potentially useful dynamics and action demonstrations from offline data to a novel task.
The main idea is to use the world models not only as simulators for behavior learning but also as tools to measure the task relevance.
We demonstrate the advantages of our approach compared with the state-of-the-art methods in Meta-World and DeepMind Control Suite.
arXiv Detail & Related papers (2023-06-06T02:24:41Z) - Contrastive Value Learning: Implicit Models for Simple Offline RL [40.95632543012637]
We propose Contrastive Value Learning (CVL), which learns an implicit, multi-step model of the environment dynamics.
CVL can be learned without access to reward functions, but nonetheless can be used to directly estimate the value of each action.
Our experiments demonstrate that CVL outperforms prior offline RL methods on complex continuous control benchmarks.
arXiv Detail & Related papers (2022-11-03T19:10:05Z) - Curious Exploration via Structured World Models Yields Zero-Shot Object
Manipulation [19.840186443344]
We propose to use structured world models to incorporate inductive biases in the control loop to achieve sample-efficient exploration.
Our method generates free-play behavior that starts to interact with objects early on and develops more complex behavior over time.
arXiv Detail & Related papers (2022-06-22T22:08:50Z) - Learning Self-Modulating Attention in Continuous Time Space with
Applications to Sequential Recommendation [102.24108167002252]
We propose a novel attention network, named self-modulating attention, that models the complex and non-linearly evolving dynamic user preferences.
We empirically demonstrate the effectiveness of our method on top-N sequential recommendation tasks, and the results on three large-scale real-world datasets show that our model can achieve state-of-the-art performance.
arXiv Detail & Related papers (2022-03-30T03:54:11Z) - Learning to drive from a world on rails [78.28647825246472]
We learn an interactive vision-based driving policy from pre-recorded driving logs via a model-based approach.
A forward model of the world supervises a driving policy that predicts the outcome of any potential driving trajectory.
Our method ranks first on the CARLA leaderboard, attaining a 25% higher driving score while using 40 times less data.
arXiv Detail & Related papers (2021-05-03T05:55:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.