Self-Supervised JEPA-based World Models for LiDAR Occupancy Completion and Forecasting
- URL: http://arxiv.org/abs/2602.12540v1
- Date: Fri, 13 Feb 2026 02:42:21 GMT
- Title: Self-Supervised JEPA-based World Models for LiDAR Occupancy Completion and Forecasting
- Authors: Haoran Zhu, Anna Choromanska,
- Abstract summary: We propose textbfAD-LiST-JEPA, a self-supervised world model for autonomous driving that predicts futuretemporal evolution from LiDAR data.<n>We evaluate the quality of the learned representations through a downstream-based occupancy completion and forecasting task.
- Score: 11.278785857643575
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Autonomous driving, as an agent operating in the physical world, requires the fundamental capability to build \textit{world models} that capture how the environment evolves spatiotemporally in order to support long-term planning. At the same time, scalability demands learning such models in a self-supervised manner; \textit{joint-embedding predictive architecture (JEPA)} enables learning world models via leveraging large volumes of unlabeled data without relying on expensive human annotations. In this paper, we propose \textbf{AD-LiST-JEPA}, a self-supervised world model for autonomous driving that predicts future spatiotemporal evolution from LiDAR data using a JEPA framework. We evaluate the quality of the learned representations through a downstream LiDAR-based occupancy completion and forecasting (OCF) task, which jointly assesses perception and prediction. Proof of concept experiments show better OCF performance with pretrained encoder after JEPA-based world model learning.
Related papers
- Demystifying Data-Driven Probabilistic Medium-Range Weather Forecasting [63.8116386935854]
We demonstrate that state-of-the-art probabilistic skill requires neither intricate architectural constraints nor specialized trainings.<n>We introduce a scalable framework for learning multi-scale atmospheric dynamics by combining a directly downsampled latent space with a history-conditioned local projector.<n>We find that our framework design is robust to the choice of probabilistic estimators, seamlessly supporting interpolants, diffusion models, and CRPS-based ensemble training.
arXiv Detail & Related papers (2026-01-26T03:52:16Z) - VJEPA: Variational Joint Embedding Predictive Architectures as Probabilistic World Models [0.0]
We introduce emphVariational JEPA (VJEPA), a textitprobabilistic generalization that learns a predictive distribution over future latent states via a variational objective.<n>VJEPA representations can serve as sufficient information states for optimal control without pixel reconstruction, while providing formal guarantees for collapse avoidance.<n>We propose emphBayesian JEPA (BJEPA), an extension that factorizes the predictive belief into a learned dynamics expert and a modular prior expert.
arXiv Detail & Related papers (2026-01-20T18:04:16Z) - Value-guided action planning with JEPA world models [44.84158001773079]
Building deep learning models that can reason about their environment requires capturing its underlying dynamics.<n>Joint-Embedded Predictive Architectures (JEPA) provide a promising framework to model such dynamics.<n>We propose an approach to enhance planning with JEPA world models by shaping their representation space.
arXiv Detail & Related papers (2025-12-28T20:17:49Z) - Koopman Invariants as Drivers of Emergent Time-Series Clustering in Joint-Embedding Predictive Architectures [0.03499870393443267]
Joint-Embedding Predictive Architectures (JEPAs) exhibit an unexplained ability to cluster time-series data by their underlying dynamical regimes.<n>We propose a novel theoretical explanation for this phenomenon, hypothesizing that JEPA's predictive objective implicitly drives it to learn the invariant subspace of the system's Koopman operator.<n>This work demystifies a key behavior of JEPAs, provides a principled connection between modern self-supervised learning and dynamical systems theory, and informs the design of more robust and interpretable time-series models.
arXiv Detail & Related papers (2025-11-12T22:33:56Z) - LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics [53.247652209132376]
Joint-Embedding Predictive Architectures (JEPAs) offer a promising blueprint, but lack of practical guidance and theory has led to ad-hoc R&D.<n>We present a comprehensive theory of JEPAs and instantiate it in bf LeJEPA, a lean, scalable, and theoretically grounded training objective.
arXiv Detail & Related papers (2025-11-11T18:21:55Z) - EmbodiedBrain: Expanding Performance Boundaries of Task Planning for Embodied Intelligence [17.644658293987955]
Embodied AI agents are capable of robust spatial perception, effective task planning, and adaptive execution in physical environments.<n>Current large language models (LLMs) and multimodal LLMs (MLLMs) for embodied tasks suffer from key limitations.<n>We propose EmbodiedBrain, a novel vision-language foundation model available in both 7B and 32B parameter sizes.
arXiv Detail & Related papers (2025-10-23T14:05:55Z) - AI in a vat: Fundamental limits of efficient world modelling for agent sandboxing and interpretability [84.52205243353761]
Recent work proposes using world models to generate controlled virtual environments in which AI agents can be tested before deployment.<n>We investigate ways of simplifying world models that remain agnostic to the AI agent under evaluation.
arXiv Detail & Related papers (2025-04-06T20:35:44Z) - A Survey of World Models for Autonomous Driving [55.520179689933904]
Recent breakthroughs in autonomous driving have been propelled by advances in robust world modeling.<n>World models offer high-fidelity representations of the driving environment that integrate multi-sensor data, semantic cues, and temporal dynamics.<n>Future research must address key challenges in self-supervised representation learning, multimodal fusion, and advanced simulation.
arXiv Detail & Related papers (2025-01-20T04:00:02Z) - Self-Supervised Representation Learning with Joint Embedding Predictive Architecture for Automotive LiDAR Object Detection [10.19369242630191]
We present AD-L-JEPA, a novel self-supervised pre-training framework for autonomous driving.<n>Unlike existing methods, AD-L-JEPA is neither generative nor contrastive.<n>It offers better quality, faster, and more GPU-memory-efficient self-supervised representation learning.
arXiv Detail & Related papers (2025-01-09T04:47:51Z) - DOME: Taming Diffusion Model into High-Fidelity Controllable Occupancy World Model [14.996395953240699]
DOME is a diffusion-based world model that predicts future occupancy frames based on past occupancy observations.
The ability of this world model to capture the evolution of the environment is crucial for planning in autonomous driving.
arXiv Detail & Related papers (2024-10-14T12:24:32Z) - Predictive World Models from Real-World Partial Observations [66.80340484148931]
We present a framework for learning a probabilistic predictive world model for real-world road environments.
While prior methods require complete states as ground truth for learning, we present a novel sequential training method to allow HVAEs to learn to predict complete states from partially observed states only.
arXiv Detail & Related papers (2023-01-12T02:07:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.