Related papers: Connecting Joint-Embedding Predictive Architecture with Contrastive Self-supervised Learning

Connecting Joint-Embedding Predictive Architecture with Contrastive Self-supervised Learning

URL: http://arxiv.org/abs/2410.19560v1
Date: Fri, 25 Oct 2024 13:48:12 GMT
Title: Connecting Joint-Embedding Predictive Architecture with Contrastive Self-supervised Learning
Authors: Shentong Mo, Shengbang Tong,
Abstract summary: Contrastive-JEPA integrates the Image-based Joint-Embedding Predictive Architecture with the Variance-Invariance-Covariance Regularization (VICReg) strategy. C-JEPA significantly enhances the stability and quality of visual representation learning. When pre-trained on the ImageNet-1K dataset, C-JEPA exhibits rapid and improved convergence in both linear probing and fine-tuning performance metrics.
Score: 14.869908713261227
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In recent advancements in unsupervised visual representation learning, the Joint-Embedding Predictive Architecture (JEPA) has emerged as a significant method for extracting visual features from unlabeled imagery through an innovative masking strategy. Despite its success, two primary limitations have been identified: the inefficacy of Exponential Moving Average (EMA) from I-JEPA in preventing entire collapse and the inadequacy of I-JEPA prediction in accurately learning the mean of patch representations. Addressing these challenges, this study introduces a novel framework, namely C-JEPA (Contrastive-JEPA), which integrates the Image-based Joint-Embedding Predictive Architecture with the Variance-Invariance-Covariance Regularization (VICReg) strategy. This integration is designed to effectively learn the variance/covariance for preventing entire collapse and ensuring invariance in the mean of augmented views, thereby overcoming the identified limitations. Through empirical and theoretical evaluations, our work demonstrates that C-JEPA significantly enhances the stability and quality of visual representation learning. When pre-trained on the ImageNet-1K dataset, C-JEPA exhibits rapid and improved convergence in both linear probing and fine-tuning performance metrics.

Related papers

Rectified LpJEPA: Joint-Embedding Predictive Architectures with Sparse and Maximum-Entropy Representations [53.61624356747686]
Joint-Embedding Predictive Architectures (JEPA) learn view-invariant representations and admit projection-based distribution matching for collapse prevention.<n>Existing approaches regularize representations towards isotropic Gaussian distributions, but inherently favor dense representations and fail to capture the key property of sparsity observed in efficient representations.<n>We introduce Rectified Distribution Matching Regularization (RDMReg), a sliced two-sample distribution-matching loss that aligns representations to a Rectified Generalized Gaussian (RGG) distribution.
arXiv Detail & Related papers (2026-02-01T21:49:30Z)
Implicit Neural Representation-Based Continuous Single Image Super Resolution: An Empirical Study [50.15623093332659]
Implicit neural representation (INR) has become the standard approach for arbitrary-scale image super-resolution (ASSR)<n>We compare existing techniques across diverse settings and present aggregated performance results on multiple image quality metrics.<n>We examine a new loss function that penalizes intensity variations while preserving edges, textures, and finer details during training.
arXiv Detail & Related papers (2026-01-25T07:09:20Z)
VJEPA: Variational Joint Embedding Predictive Architectures as Probabilistic World Models [0.0]
We introduce emphVariational JEPA (VJEPA), a textitprobabilistic generalization that learns a predictive distribution over future latent states via a variational objective.<n>VJEPA representations can serve as sufficient information states for optimal control without pixel reconstruction, while providing formal guarantees for collapse avoidance.<n>We propose emphBayesian JEPA (BJEPA), an extension that factorizes the predictive belief into a learned dynamics expert and a modular prior expert.
arXiv Detail & Related papers (2026-01-20T18:04:16Z)
DSeq-JEPA: Discriminative Sequential Joint-Embedding Predictive Architecture [34.31498256147088]
DSeq-JEPA bridges predictive and autoregressive self-supervised learning, integrating JEPA-style latent prediction with GPT-style sequential reasoning.<n>DSeq-JEPA consistently focuses on more discriminative and generalizable representations than I-JEPA variants.
arXiv Detail & Related papers (2025-11-21T16:18:50Z)
TRACED: Transition-aware Regret Approximation with Co-learnability for Environment Design [5.404569468550549]
Generalizing deep reinforcement learning agents to unseen environments remains a significant challenge.<n>We present Transition-aware Regret Approximation with Co-learnability for Environment Design (TRACED)<n>TRACED yields curricula that improve zero-shot generalization across multiple benchmarks while requiring up to 2x fewer environment interactions than strong baselines.
arXiv Detail & Related papers (2025-06-24T20:29:24Z)
CRIA: A Cross-View Interaction and Instance-Adapted Pre-training Framework for Generalizable EEG Representations [52.251569042852815]
CRIA is an adaptive framework that utilizes variable-length and variable-channel coding to achieve a unified representation of EEG data across different datasets.<n>The model employs a cross-attention mechanism to fuse temporal, spectral, and spatial features effectively.<n> Experimental results on the Temple University EEG corpus and the CHB-MIT dataset show that CRIA outperforms existing methods with the same pre-training conditions.
arXiv Detail & Related papers (2025-06-19T06:31:08Z)
Interpretable Few-Shot Image Classification via Prototypical Concept-Guided Mixture of LoRA Experts [79.18608192761512]
Self-Explainable Models (SEMs) rely on Prototypical Concept Learning (PCL) to enable their visual recognition processes more interpretable.<n>We propose a Few-Shot Prototypical Concept Classification framework that mitigates two key challenges under low-data regimes: parametric imbalance and representation misalignment.<n>Our approach consistently outperforms existing SEMs by a notable margin, with 4.2%-8.7% relative gains in 5-way 5-shot classification.
arXiv Detail & Related papers (2025-06-05T06:39:43Z)
SparseJEPA: Sparse Representation Learning of Joint Embedding Predictive Architectures [0.46040036610482665]
Joint Embedding Predictive Architectures (JEPA) have emerged as a powerful framework for learning general-purpose representations. We propose SparseJEPA, an extension that integrates sparse representation learning into the JEPA framework to enhance the quality of learned representations.
arXiv Detail & Related papers (2025-04-22T02:43:00Z)
Rethinking Link Prediction for Directed Graphs [73.36395969796804]
Link prediction for directed graphs is a crucial task with diverse real-world applications. Recent advances in embedding methods and Graph Neural Networks (GNNs) have shown promising improvements. We propose a unified framework to assess the expressiveness of existing methods, highlighting the impact of dual embeddings and decoder design on performance.
arXiv Detail & Related papers (2025-02-08T23:51:05Z)
ACT-JEPA: Novel Joint-Embedding Predictive Architecture for Efficient Policy Representation Learning [90.41852663775086]
ACT-JEPA is a novel architecture that integrates imitation learning and self-supervised learning. We train a policy to predict action sequences and abstract observation sequences. Our experiments show that ACT-JEPA improves the quality of representations by learning temporal environment dynamics.
arXiv Detail & Related papers (2025-01-24T16:41:41Z)
Enhanced Extractor-Selector Framework and Symmetrization Weighted Binary Cross-Entropy for Edge Detections [0.0]
Recent advancements have demonstrated the effectiveness of the extractor-selector (E-S) framework in edge detection (ED) tasks. We propose an enhanced E-S architecture, which utilizes richer, less-loss feature representations. We introduce a novel loss function, the Symmetrization Weight Binary Cross-Entropy (SWBCE), which simultaneously emphasizes both the recall of edge pixels and the suppression of erroneous edge predictions.
arXiv Detail & Related papers (2025-01-23T04:10:31Z)
Denoising with a Joint-Embedding Predictive Architecture [21.42513407755273]
We introduce Denoising with a Joint-Embedding Predictive Architecture (D-JEPA) By recognizing JEPA as a form of masked image modeling, we reinterpret it as a generalized next-token prediction strategy. We also incorporate diffusion loss to model the per-token probability distribution, enabling data generation in a continuous space.
arXiv Detail & Related papers (2024-10-02T05:57:10Z)
A Simple and Generalist Approach for Panoptic Segmentation [57.94892855772925]
Generalist vision models aim for one and the same architecture for a variety of vision tasks. While such shared architecture may seem attractive, generalist models tend to be outperformed by their bespoken counterparts. We address this problem by introducing two key contributions, without compromising the desirable properties of generalist models.
arXiv Detail & Related papers (2024-08-29T13:02:12Z)
Relaxed Contrastive Learning for Federated Learning [48.96253206661268]
We propose a novel contrastive learning framework to address the challenges of data heterogeneity in federated learning. Our framework outperforms all existing federated learning approaches by huge margins on the standard benchmarks.
arXiv Detail & Related papers (2024-01-10T04:55:24Z)
Data Assimilation in Chaotic Systems Using Deep Reinforcement Learning [0.5999777817331317]
Data assimilation plays a pivotal role in diverse applications, ranging from climate predictions and weather forecasts to trajectory planning for autonomous vehicles. Recent advancements have seen the emergence of deep learning approaches in this domain, primarily within a supervised learning framework. In this study, we introduce a novel DA strategy that utilizes reinforcement learning (RL) to apply state corrections using full or partial observations of the state variables.
arXiv Detail & Related papers (2024-01-01T06:53:36Z)
Semi-supervised Contrastive Regression for Estimation of Eye Gaze [0.609170287691728]
This paper develops a semi-supervised contrastive learning framework for estimation of gaze direction. With a small labeled gaze dataset, the framework is able to find a generalized solution even for unseen face images. Our contrastive regression framework shows good performance in comparison to several state of the art contrastive learning techniques used for gaze estimation.
arXiv Detail & Related papers (2023-08-05T04:11:38Z)
Joint Embedding Predictive Architectures Focus on Slow Features [56.393060086442006]
Joint Embedding Predictive Architectures (JEPA) offer a reconstruction-free alternative. We analyze performance of JEPA trained with VICReg and SimCLR objectives in the fully offline setting without access to rewards. We find that JEPA methods perform on par or better than reconstruction when distractor noise changes every time step, but fail when the noise is fixed.
arXiv Detail & Related papers (2022-11-20T00:50:11Z)
Consistency Regularization for Deep Face Anti-Spoofing [69.70647782777051]
Face anti-spoofing (FAS) plays a crucial role in securing face recognition systems. Motivated by this exciting observation, we conjecture that encouraging feature consistency of different views may be a promising way to boost FAS models. We enhance both Embedding-level and Prediction-level Consistency Regularization (EPCR) in FAS.
arXiv Detail & Related papers (2021-11-24T08:03:48Z)
Peeking into occluded joints: A novel framework for crowd pose estimation [88.56203133287865]
OPEC-Net is an Image-Guided Progressive GCN module that estimates invisible joints from an inference perspective. OCPose is the most complex Occluded Pose dataset with respect to average IoU between adjacent instances.
arXiv Detail & Related papers (2020-03-23T19:32:40Z)
Unsupervised Domain Adaptation in Person re-ID via k-Reciprocal Clustering and Large-Scale Heterogeneous Environment Synthesis [76.46004354572956]
We introduce an unsupervised domain adaptation approach for person re-identification. Experimental results show that the proposed ktCUDA and SHRED approach achieves an average improvement of +5.7 mAP in re-identification performance.
arXiv Detail & Related papers (2020-01-14T17:43:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.