Related papers: Gaussian Embeddings: How JEPAs Secretly Learn Your Data Density

Gaussian Embeddings: How JEPAs Secretly Learn Your Data Density

URL: http://arxiv.org/abs/2510.05949v1
Date: Tue, 07 Oct 2025 14:06:30 GMT
Title: Gaussian Embeddings: How JEPAs Secretly Learn Your Data Density
Authors: Randall Balestriero, Nicolas Ballas, Mike Rabbat, Yann LeCun,
Abstract summary: Joint Embedding Predictive Architectures (JEPAs) learn representations able to solve numerous downstream tasks out-of-the-box.<n>JEPAs combine two objectives: (i) a latent-space prediction term, i.e., the representation of a slightly perturbed sample must be predictable from the original sample's representation, and (ii) an anti-collapse term, i.e., not all samples should have the same representation.
Score: 51.15085346971361
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Joint Embedding Predictive Architectures (JEPAs) learn representations able to solve numerous downstream tasks out-of-the-box. JEPAs combine two objectives: (i) a latent-space prediction term, i.e., the representation of a slightly perturbed sample must be predictable from the original sample's representation, and (ii) an anti-collapse term, i.e., not all samples should have the same representation. While (ii) is often considered as an obvious remedy to representation collapse, we uncover that JEPAs' anti-collapse term does much more--it provably estimates the data density. In short, any successfully trained JEPA can be used to get sample probabilities, e.g., for data curation, outlier detection, or simply for density estimation. Our theoretical finding is agnostic of the dataset and architecture used--in any case one can compute the learned probabilities of sample $x$ efficiently and in closed-form using the model's Jacobian matrix at $x$. Our findings are empirically validated across datasets (synthetic, controlled, and Imagenet) and across different Self Supervised Learning methods falling under the JEPA family (I-JEPA and DINOv2) and on multimodal models, such as MetaCLIP. We denote the method extracting the JEPA learned density as {\bf JEPA-SCORE}.

Related papers

SFBD-OMNI: Bridge models for lossy measurement restoration with limited clean samples [22.912528721457473]
In many real-world scenarios, obtaining fully observed samples is expensive or even infeasible.<n>In this work, we study distribution restoration with abundant noisy samples, assuming the corruption process is available as a black-box generator.<n>We show that this task can be framed as a one-sided optimal transport problem and solved via an EM-like algorithm.
arXiv Detail & Related papers (2025-12-18T20:37:56Z)
LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics [53.247652209132376]
Joint-Embedding Predictive Architectures (JEPAs) offer a promising blueprint, but lack of practical guidance and theory has led to ad-hoc R&D.<n>We present a comprehensive theory of JEPAs and instantiate it in bf LeJEPA, a lean, scalable, and theoretically grounded training objective.
arXiv Detail & Related papers (2025-11-11T18:21:55Z)
T-JEPA: Augmentation-Free Self-Supervised Learning for Tabular Data [0.0]
Self-supervised learning (SSL) generally involves generating different views of the same sample and thus requires data augmentations.<n>In the present work, we propose a novel augmentation-free SSL method for structured data.<n>Our approach, T-JEPA, relies on a Joint Embedding Predictive Architecture (JEPA) and is akin to mask reconstruction in the latent space.
arXiv Detail & Related papers (2024-10-07T13:15:07Z)
How JEPA Avoids Noisy Features: The Implicit Bias of Deep Linear Self Distillation Networks [14.338754598043968]
Two competing paradigms exist for self-supervised learning of data representations. Joint Embedding Predictive Architecture (JEPA) is a class of architectures in which semantically similar inputs are encoded into representations that are predictive of each other.
arXiv Detail & Related papers (2024-07-03T19:43:12Z)
ProbVLM: Probabilistic Adapter for Frozen Vision-Language Models [69.50316788263433]
We propose ProbVLM, a probabilistic adapter that estimates probability distributions for the embeddings of pre-trained vision-language models. We quantify the calibration of embedding uncertainties in retrieval tasks and show that ProbVLM outperforms other methods. We present a novel technique for visualizing the embedding distributions using a large-scale pre-trained latent diffusion model.
arXiv Detail & Related papers (2023-07-01T18:16:06Z)
Few-Shot Non-Parametric Learning with Deep Latent Variable Model [50.746273235463754]
We propose Non-Parametric learning by Compression with Latent Variables (NPC-LV) NPC-LV is a learning framework for any dataset with abundant unlabeled data but very few labeled ones. We show that NPC-LV outperforms supervised methods on all three datasets on image classification in low data regime.
arXiv Detail & Related papers (2022-06-23T09:35:03Z)
Learn from Unpaired Data for Image Restoration: A Variational Bayes Approach [18.007258270845107]
We propose LUD-VAE, a deep generative method to learn the joint probability density function from data sampled from marginal distributions. We apply our method to real-world image denoising and super-resolution tasks and train the models using the synthetic data generated by the LUD-VAE.
arXiv Detail & Related papers (2022-04-21T13:27:17Z)
Meta-Learning for Relative Density-Ratio Estimation [59.75321498170363]
Existing methods for (relative) density-ratio estimation (DRE) require many instances from both densities. We propose a meta-learning method for relative DRE, which estimates the relative density-ratio from a few instances by using knowledge in related datasets. We empirically demonstrate the effectiveness of the proposed method by using three problems: relative DRE, dataset comparison, and outlier detection.
arXiv Detail & Related papers (2021-07-02T02:13:45Z)
DecAug: Augmenting HOI Detection via Decomposition [54.65572599920679]
Current algorithms suffer from insufficient training samples and category imbalance within datasets. We propose an efficient and effective data augmentation method called DecAug for HOI detection. Experiments show that our method brings up to 3.3 mAP and 1.6 mAP improvements on V-COCO and HICODET dataset.
arXiv Detail & Related papers (2020-10-02T13:59:05Z)
A Provably Efficient Sample Collection Strategy for Reinforcement Learning [123.69175280309226]
One of the challenges in online reinforcement learning (RL) is that the agent needs to trade off the exploration of the environment and the exploitation of the samples to optimize its behavior. We propose to tackle the exploration-exploitation problem following a decoupled approach composed of: 1) An "objective-specific" algorithm that prescribes how many samples to collect at which states, as if it has access to a generative model (i.e., sparse simulator of the environment); 2) An "objective-agnostic" sample collection responsible for generating the prescribed samples as fast as possible.
arXiv Detail & Related papers (2020-07-13T15:17:35Z)
Distance in Latent Space as Novelty Measure [0.0]
We propose to intelligently select samples when constructing data sets. The selection methodology is based on the presumption that two dissimilar samples are worth more than two similar samples in a data set. By using a self-supervised method to construct the latent space, it is ensured that the space fits the data well and that any upfront labeling effort can be avoided.
arXiv Detail & Related papers (2020-03-31T09:14:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.