VJEPA: Variational Joint Embedding Predictive Architectures as Probabilistic World Models
- URL: http://arxiv.org/abs/2601.14354v1
- Date: Tue, 20 Jan 2026 18:04:16 GMT
- Title: VJEPA: Variational Joint Embedding Predictive Architectures as Probabilistic World Models
- Authors: Yongchao Huang,
- Abstract summary: We introduce emphVariational JEPA (VJEPA), a textitprobabilistic generalization that learns a predictive distribution over future latent states via a variational objective.<n>VJEPA representations can serve as sufficient information states for optimal control without pixel reconstruction, while providing formal guarantees for collapse avoidance.<n>We propose emphBayesian JEPA (BJEPA), an extension that factorizes the predictive belief into a learned dynamics expert and a modular prior expert.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Joint Embedding Predictive Architectures (JEPA) offer a scalable paradigm for self-supervised learning by predicting latent representations rather than reconstructing high-entropy observations. However, existing formulations rely on \textit{deterministic} regression objectives, which mask probabilistic semantics and limit its applicability in stochastic control. In this work, we introduce \emph{Variational JEPA (VJEPA)}, a \textit{probabilistic} generalization that learns a predictive distribution over future latent states via a variational objective. We show that VJEPA unifies representation learning with Predictive State Representations (PSRs) and Bayesian filtering, establishing that sequential modeling does not require autoregressive observation likelihoods. Theoretically, we prove that VJEPA representations can serve as sufficient information states for optimal control without pixel reconstruction, while providing formal guarantees for collapse avoidance. We further propose \emph{Bayesian JEPA (BJEPA)}, an extension that factorizes the predictive belief into a learned dynamics expert and a modular prior expert, enabling zero-shot task transfer and constraint (e.g. goal, physics) satisfaction via a Product of Experts. Empirically, through a noisy environment experiment, we demonstrate that VJEPA and BJEPA successfully filter out high-variance nuisance distractors that cause representation collapse in generative baselines. By enabling principled uncertainty estimation (e.g. constructing credible intervals via sampling) while remaining likelihood-free regarding observations, VJEPA provides a foundational framework for scalable, robust, uncertainty-aware planning in high-dimensional, noisy environments.
Related papers
- Self-Supervised JEPA-based World Models for LiDAR Occupancy Completion and Forecasting [11.278785857643575]
We propose textbfAD-LiST-JEPA, a self-supervised world model for autonomous driving that predicts futuretemporal evolution from LiDAR data.<n>We evaluate the quality of the learned representations through a downstream-based occupancy completion and forecasting task.
arXiv Detail & Related papers (2026-02-13T02:42:21Z) - Causal-JEPA: Learning World Models through Object-Level Latent Interventions [46.562961546550895]
C-JEPA is a simple and flexible object-centric world model that extends masked joint embedding prediction from image patches to object-centric representations.<n>By applying object-level masking that requires an object's state to be inferred from other objects, C-JEPA induces latent interventions with counterfactual-like effects.
arXiv Detail & Related papers (2026-02-11T21:47:26Z) - Bridging the Gap Between Bayesian Deep Learning and Ensemble Weather Forecasts [100.26854618129039]
Weather forecasting is fundamentally challenged by the chaotic nature of the atmosphere.<n>Recent advances in Bayesian Deep Learning (BDL) offer a promising but often disconnected alternative.<n>We bridge these paradigms through a unified hybrid BDL framework for ensemble weather forecasting.
arXiv Detail & Related papers (2025-11-18T07:49:52Z) - LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics [53.247652209132376]
Joint-Embedding Predictive Architectures (JEPAs) offer a promising blueprint, but lack of practical guidance and theory has led to ad-hoc R&D.<n>We present a comprehensive theory of JEPAs and instantiate it in bf LeJEPA, a lean, scalable, and theoretically grounded training objective.
arXiv Detail & Related papers (2025-11-11T18:21:55Z) - ScenGAN: Attention-Intensive Generative Model for Uncertainty-Aware Renewable Scenario Forecasting [11.600987173982107]
This paper explores uncertainties in the realms of renewable power and deep learning.<n>An uncertainty-aware model is meticulously designed for renewable scenario forecasting.<n>The integration of meteorological information, forecasts, and historical trajectories in the processing layer improves the synergistic forecasting capability.
arXiv Detail & Related papers (2025-09-21T15:18:51Z) - Deep Active Inference Agents for Delayed and Long-Horizon Environments [1.693200946453174]
AIF agents rely on accurate immediate predictions and exhaustive planning, a limitation that is exacerbated in delayed environments.<n>We propose a generative-policy architecture featuring a multi-step latent transition that lets the generative model predict an entire horizon in a single look-ahead.<n>We evaluate our agent in an environment that mimics a realistic industrial scenario with delayed and long-horizon settings.
arXiv Detail & Related papers (2025-05-26T11:50:22Z) - ACT-JEPA: Novel Joint-Embedding Predictive Architecture for Efficient Policy Representation Learning [90.41852663775086]
ACT-JEPA is a novel architecture that integrates imitation learning and self-supervised learning.<n>We train a policy to predict action sequences and abstract observation sequences.<n>Our experiments show that ACT-JEPA improves the quality of representations by learning temporal environment dynamics.
arXiv Detail & Related papers (2025-01-24T16:41:41Z) - Connecting Joint-Embedding Predictive Architecture with Contrastive Self-supervised Learning [14.869908713261227]
Contrastive-JEPA integrates the Image-based Joint-Embedding Predictive Architecture with the Variance-Invariance-Covariance Regularization (VICReg) strategy.
C-JEPA significantly enhances the stability and quality of visual representation learning.
When pre-trained on the ImageNet-1K dataset, C-JEPA exhibits rapid and improved convergence in both linear probing and fine-tuning performance metrics.
arXiv Detail & Related papers (2024-10-25T13:48:12Z) - Denoising with a Joint-Embedding Predictive Architecture [21.42513407755273]
We introduce Denoising with a Joint-Embedding Predictive Architecture (D-JEPA)<n>By recognizing JEPA as a form of masked image modeling, we reinterpret it as a generalized next-token prediction strategy.<n>We also incorporate diffusion loss to model the per-token probability distribution, enabling data generation in a continuous space.
arXiv Detail & Related papers (2024-10-02T05:57:10Z) - Federated Conformal Predictors for Distributed Uncertainty
Quantification [83.50609351513886]
Conformal prediction is emerging as a popular paradigm for providing rigorous uncertainty quantification in machine learning.
In this paper, we extend conformal prediction to the federated learning setting.
We propose a weaker notion of partial exchangeability, better suited to the FL setting, and use it to develop the Federated Conformal Prediction framework.
arXiv Detail & Related papers (2023-05-27T19:57:27Z) - Heterogeneous-Agent Trajectory Forecasting Incorporating Class
Uncertainty [54.88405167739227]
We present HAICU, a method for heterogeneous-agent trajectory forecasting that explicitly incorporates agents' class probabilities.
We additionally present PUP, a new challenging real-world autonomous driving dataset.
We demonstrate that incorporating class probabilities in trajectory forecasting significantly improves performance in the face of uncertainty.
arXiv Detail & Related papers (2021-04-26T10:28:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.