Related papers: How JEPA Avoids Noisy Features: The Implicit Bias of Deep Linear Self Distillation Networks

How JEPA Avoids Noisy Features: The Implicit Bias of Deep Linear Self Distillation Networks

URL: http://arxiv.org/abs/2407.03475v1
Date: Wed, 3 Jul 2024 19:43:12 GMT
Title: How JEPA Avoids Noisy Features: The Implicit Bias of Deep Linear Self Distillation Networks
Authors: Etai Littwin, Omid Saremi, Madhu Advani, Vimal Thilak, Preetum Nakkiran, Chen Huang, Joshua Susskind,
Abstract summary: Two competing paradigms exist for self-supervised learning of data representations. Joint Embedding Predictive Architecture (JEPA) is a class of architectures in which semantically similar inputs are encoded into representations that are predictive of each other.
Score: 14.338754598043968
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Two competing paradigms exist for self-supervised learning of data representations. Joint Embedding Predictive Architecture (JEPA) is a class of architectures in which semantically similar inputs are encoded into representations that are predictive of each other. A recent successful approach that falls under the JEPA framework is self-distillation, where an online encoder is trained to predict the output of the target encoder, sometimes using a lightweight predictor network. This is contrasted with the Masked AutoEncoder (MAE) paradigm, where an encoder and decoder are trained to reconstruct missing parts of the input in the data space rather, than its latent representation. A common motivation for using the JEPA approach over MAE is that the JEPA objective prioritizes abstract features over fine-grained pixel information (which can be unpredictable and uninformative). In this work, we seek to understand the mechanism behind this empirical observation by analyzing the training dynamics of deep linear models. We uncover a surprising mechanism: in a simplified linear setting where both approaches learn similar representations, JEPAs are biased to learn high-influence features, i.e., features characterized by having high regression coefficients. Our results point to a distinct implicit bias of predicting in latent space that may shed light on its success in practice.

Related papers

Enhancing Training Data Attribution with Representational Optimization [57.61977909113113]
Training data attribution methods aim to measure how training data impacts a model's predictions.<n>We propose AirRep, a representation-based approach that closes this gap by learning task-specific and model-aligned representations explicitly for TDA.<n>AirRep introduces two key innovations: a trainable encoder tuned for attribution quality, and an attention-based pooling mechanism that enables accurate estimation of group-wise influence.
arXiv Detail & Related papers (2025-05-24T05:17:53Z)
SparseJEPA: Sparse Representation Learning of Joint Embedding Predictive Architectures [0.46040036610482665]
Joint Embedding Predictive Architectures (JEPA) have emerged as a powerful framework for learning general-purpose representations. We propose SparseJEPA, an extension that integrates sparse representation learning into the JEPA framework to enhance the quality of learned representations.
arXiv Detail & Related papers (2025-04-22T02:43:00Z)
Enhancing JEPAs with Spatial Conditioning: Robust and Efficient Representation Learning [7.083341587100975]
Image-based Joint-Embedding Predictive Architecture (IJEPA) offers an attractive alternative to Masked Autoencoder (MAE) IJEPA drives representations to capture useful semantic information by predicting in latent rather than input space. Our "conditional" encoders show performance gains on several image classification benchmark datasets.
arXiv Detail & Related papers (2024-10-14T17:46:24Z)
T-JEPA: Augmentation-Free Self-Supervised Learning for Tabular Data [0.0]
Self-supervised learning (SSL) generally involves generating different views of the same sample and thus requires data augmentations. In the present work, we propose a novel augmentation-free SSL method for structured data. Our approach, T-JEPA, relies on a Joint Embedding Predictive Architecture (JEPA) and is akin to mask reconstruction in the latent space.
arXiv Detail & Related papers (2024-10-07T13:15:07Z)
Complexity Matters: Rethinking the Latent Space for Generative Modeling [65.64763873078114]
In generative modeling, numerous successful approaches leverage a low-dimensional latent space, e.g., Stable Diffusion. In this study, we aim to shed light on this under-explored topic by rethinking the latent space from the perspective of model complexity.
arXiv Detail & Related papers (2023-07-17T07:12:29Z)
Improving Deep Representation Learning via Auxiliary Learnable Target Coding [69.79343510578877]
This paper introduces a novel learnable target coding as an auxiliary regularization of deep representation learning. Specifically, a margin-based triplet loss and a correlation consistency loss on the proposed target codes are designed to encourage more discriminative representations.
arXiv Detail & Related papers (2023-05-30T01:38:54Z)
Disentanglement via Latent Quantization [60.37109712033694]
In this work, we construct an inductive bias towards encoding to and decoding from an organized latent space. We demonstrate the broad applicability of this approach by adding it to both basic data-re (vanilla autoencoder) and latent-reconstructing (InfoGAN) generative models.
arXiv Detail & Related papers (2023-05-28T06:30:29Z)
ALSO: Automotive Lidar Self-supervision by Occupancy estimation [70.70557577874155]
We propose a new self-supervised method for pre-training the backbone of deep perception models operating on point clouds. The core idea is to train the model on a pretext task which is the reconstruction of the surface on which the 3D points are sampled. The intuition is that if the network is able to reconstruct the scene surface, given only sparse input points, then it probably also captures some fragments of semantic information.
arXiv Detail & Related papers (2022-12-12T13:10:19Z)
Learning Invariant World State Representations with Predictive Coding [1.8963850600275547]
We develop a new predictive coding-based architecture and a hybrid fully-supervised/self-supervised learning method. We evaluate the robustness of our model on a new synthetic dataset.
arXiv Detail & Related papers (2022-07-06T21:08:30Z)
Toward a Geometrical Understanding of Self-supervised Contrastive Learning [55.83778629498769]
Self-supervised learning (SSL) is one of the premier techniques to create data representations that are actionable for transfer learning in the absence of human annotations. Mainstream SSL techniques rely on a specific deep neural network architecture with two cascaded neural networks: the encoder and the projector. In this paper, we investigate how the strength of the data augmentation policies affects the data embedding.
arXiv Detail & Related papers (2022-05-13T23:24:48Z)
Neurosymbolic hybrid approach to driver collision warning [64.02492460600905]
There are two main algorithmic approaches to autonomous driving systems. Deep learning alone has achieved state-of-the-art results in many areas. But sometimes it can be very difficult to debug if the deep learning model doesn't work.
arXiv Detail & Related papers (2022-03-28T20:29:50Z)
Online Deep Learning based on Auto-Encoder [4.128388784932455]
We propose a two-phase Online Deep Learning based on Auto-Encoder (ODLAE) Based on auto-encoder, considering reconstruction loss, we extract abstract hierarchical latent representations of instances. We devise two fusion strategies: the output-level fusion strategy, which is obtained by fusing the classification results of each hidden layer; and feature-level fusion strategy, which is leveraged self-attention mechanism to fusion every hidden layer output.
arXiv Detail & Related papers (2022-01-19T02:14:57Z)
Merging Two Cultures: Deep and Statistical Learning [3.15863303008255]
Merging the two cultures of deep and statistical learning provides insights into structured high-dimensional data. We show that prediction, optimisation and uncertainty can be achieved using probabilistic methods at the output layer of the model.
arXiv Detail & Related papers (2021-10-22T02:57:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.