Related papers: LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics

LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics

URL: http://arxiv.org/abs/2511.08544v3
Date: Fri, 14 Nov 2025 08:38:32 GMT
Title: LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics
Authors: Randall Balestriero, Yann LeCun,
Abstract summary: Joint-Embedding Predictive Architectures (JEPAs) offer a promising blueprint, but lack of practical guidance and theory has led to ad-hoc R&D.<n>We present a comprehensive theory of JEPAs and instantiate it in bf LeJEPA, a lean, scalable, and theoretically grounded training objective.
Score: 53.247652209132376
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Learning manipulable representations of the world and its dynamics is central to AI. Joint-Embedding Predictive Architectures (JEPAs) offer a promising blueprint, but lack of practical guidance and theory has led to ad-hoc R&D. We present a comprehensive theory of JEPAs and instantiate it in {\bf LeJEPA}, a lean, scalable, and theoretically grounded training objective. First, we identify the isotropic Gaussian as the optimal distribution that JEPAs' embeddings should follow to minimize downstream prediction risk. Second, we introduce a novel objective--{\bf Sketched Isotropic Gaussian Regularization} (SIGReg)--to constrain embeddings to reach that ideal distribution. Combining the JEPA predictive loss with SIGReg yields LeJEPA with numerous theoretical and practical benefits: (i) single trade-off hyperparameter, (ii) linear time and memory complexity, (iii) stability across hyper-parameters, architectures (ResNets, ViTs, ConvNets) and domains, (iv) heuristics-free, e.g., no stop-gradient, no teacher-student, no hyper-parameter schedulers, and (v) distributed training-friendly implementation requiring only $\approx$50 lines of code. Our empirical validation covers 10+ datasets, 60+ architectures, all with varying scales and domains. As an example, using imagenet-1k for pretraining and linear evaluation with frozen backbone, LeJEPA reaches 79\% with a ViT-H/14. We hope that the simplicity and theory-friendly ecosystem offered by LeJEPA will reestablish self-supervised pre-training as a core pillar of AI research (\href{https://github.com/rbalestr-lab/lejepa}{GitHub repo}).

Related papers

Semantic Tube Prediction: Beating LLM Data Efficiency with JEPA [50.494504099850325]
We introduce the Geodesic Hypothesis, positing that token sequences trace geodesics on a smooth semantic manifold and are therefore locally linear.<n>We show this constraint improves signal-to-noise ratio, and preserves diversity by preventing collisions during trajectory.<n>We demonstrate that geometric priors can surpass brute-force scaling.
arXiv Detail & Related papers (2026-02-26T04:45:07Z)
Self-Supervised JEPA-based World Models for LiDAR Occupancy Completion and Forecasting [11.278785857643575]
We propose textbfAD-LiST-JEPA, a self-supervised world model for autonomous driving that predicts futuretemporal evolution from LiDAR data.<n>We evaluate the quality of the learned representations through a downstream-based occupancy completion and forecasting task.
arXiv Detail & Related papers (2026-02-13T02:42:21Z)
A Lightweight Library for Energy-Based Joint-Embedding Predictive Architectures [58.26804959656713]
We present EB-JEPA, an open-source library for learning representations and world models using Joint-Embedding Predictive Architectures (JEPAs)<n>JEPAs learn to predict in representation space rather than pixel space, avoiding the pitfalls of generative modeling.<n>We show how these representations can drive action-conditioned world models, achieving a 97% planning success rate on the Two Rooms navigation task.
arXiv Detail & Related papers (2026-02-03T14:56:24Z)
VJEPA: Variational Joint Embedding Predictive Architectures as Probabilistic World Models [0.0]
We introduce emphVariational JEPA (VJEPA), a textitprobabilistic generalization that learns a predictive distribution over future latent states via a variational objective.<n>VJEPA representations can serve as sufficient information states for optimal control without pixel reconstruction, while providing formal guarantees for collapse avoidance.<n>We propose emphBayesian JEPA (BJEPA), an extension that factorizes the predictive belief into a learned dynamics expert and a modular prior expert.
arXiv Detail & Related papers (2026-01-20T18:04:16Z)
Let It Flow: Agentic Crafting on Rock and Roll, Building the ROME Model within an Open Agentic Learning Ecosystem [90.17610617854247]
We introduce the Agentic Learning Ecosystem (ALE), a foundational infrastructure that optimize the production pipeline for agentic model.<n>ALE consists of three components: ROLL, a post-training framework for weight optimization; ROCK, a sandbox environment manager for trajectory generation; and iFlow CLI, an agent framework for efficient context engineering.<n>We release ROME, an open-source agent grounded by ALE and trained on over one million trajectories.
arXiv Detail & Related papers (2025-12-31T14:03:39Z)
Gaussian Embeddings: How JEPAs Secretly Learn Your Data Density [51.15085346971361]
Joint Embedding Predictive Architectures (JEPAs) learn representations able to solve numerous downstream tasks out-of-the-box.<n>JEPAs combine two objectives: (i) a latent-space prediction term, i.e., the representation of a slightly perturbed sample must be predictable from the original sample's representation, and (ii) an anti-collapse term, i.e., not all samples should have the same representation.
arXiv Detail & Related papers (2025-10-07T14:06:30Z)
TD-JEPA: Latent-predictive Representations for Zero-Shot Reinforcement Learning [63.73629127832652]
We introduce TD-JEPA, which leverages TD-based latent-predictive representations into unsupervised RL.<n> TD-JEPA trains explicit state and task encoders, a policy-conditioned multi-step predictor, and a set of parameterized policies directly in latent space.<n> Empirically, TD-JEPA matches or outperforms state-of-the-art baselines on locomotion, navigation, and manipulation tasks across 13 datasets.
arXiv Detail & Related papers (2025-10-01T10:21:18Z)
Label-Efficient Grasp Joint Prediction with Point-JEPA [0.0]
3D self-supervised pretraining with Point--JEPA enables label-efficient grasp joint-angle prediction.<n>JEPA-style pretraining is a practical lever for data-efficient grasp learning.
arXiv Detail & Related papers (2025-09-13T21:00:03Z)
Denoising with a Joint-Embedding Predictive Architecture [21.42513407755273]
We introduce Denoising with a Joint-Embedding Predictive Architecture (D-JEPA)<n>By recognizing JEPA as a form of masked image modeling, we reinterpret it as a generalized next-token prediction strategy.<n>We also incorporate diffusion loss to model the per-token probability distribution, enabling data generation in a continuous space.
arXiv Detail & Related papers (2024-10-02T05:57:10Z)
How JEPA Avoids Noisy Features: The Implicit Bias of Deep Linear Self Distillation Networks [14.338754598043968]
Two competing paradigms exist for self-supervised learning of data representations. Joint Embedding Predictive Architecture (JEPA) is a class of architectures in which semantically similar inputs are encoded into representations that are predictive of each other.
arXiv Detail & Related papers (2024-07-03T19:43:12Z)
Is Inverse Reinforcement Learning Harder than Standard Reinforcement Learning? A Theoretical Perspective [55.36819597141271]
Inverse Reinforcement Learning (IRL) -- the problem of learning reward functions from demonstrations of an emphexpert policy -- plays a critical role in developing intelligent systems. This paper provides the first line of efficient IRL in vanilla offline and online settings using samples and runtime. As an application, we show that the learned rewards can emphtransfer to another target MDP with suitable guarantees.
arXiv Detail & Related papers (2023-11-29T00:09:01Z)
Unifying Synergies between Self-supervised Learning and Dynamic Computation [53.66628188936682]
We present a novel perspective on the interplay between SSL and DC paradigms. We show that it is feasible to simultaneously learn a dense and gated sub-network from scratch in a SSL setting. The co-evolution during pre-training of both dense and gated encoder offers a good accuracy-efficiency trade-off.
arXiv Detail & Related papers (2023-01-22T17:12:58Z)
Towards Scaling Difference Target Propagation by Learning Backprop Targets [64.90165892557776]
Difference Target Propagation is a biologically-plausible learning algorithm with close relation with Gauss-Newton (GN) optimization. We propose a novel feedback weight training scheme that ensures both that DTP approximates BP and that layer-wise feedback weight training can be restored. We report the best performance ever achieved by DTP on CIFAR-10 and ImageNet.
arXiv Detail & Related papers (2022-01-31T18:20:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.