JEPA for RL: Investigating Joint-Embedding Predictive Architectures for Reinforcement Learning
- URL: http://arxiv.org/abs/2504.16591v1
- Date: Wed, 23 Apr 2025 10:16:12 GMT
- Title: JEPA for RL: Investigating Joint-Embedding Predictive Architectures for Reinforcement Learning
- Authors: Tristan Kenneweg, Philip Kenneweg, Barbara Hammer,
- Abstract summary: We show how to adapt the Joint-Embedding Predictive Architecture to reinforcement learning from images.<n>We discuss model collapse, show how to prevent it, and provide exemplary data on the classical Cart Pole task.
- Score: 4.862490782515929
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Joint-Embedding Predictive Architectures (JEPA) have recently become popular as promising architectures for self-supervised learning. Vision transformers have been trained using JEPA to produce embeddings from images and videos, which have been shown to be highly suitable for downstream tasks like classification and segmentation. In this paper, we show how to adapt the JEPA architecture to reinforcement learning from images. We discuss model collapse, show how to prevent it, and provide exemplary data on the classical Cart Pole task.
Related papers
- A Lightweight Library for Energy-Based Joint-Embedding Predictive Architectures [58.26804959656713]
We present EB-JEPA, an open-source library for learning representations and world models using Joint-Embedding Predictive Architectures (JEPAs)<n>JEPAs learn to predict in representation space rather than pixel space, avoiding the pitfalls of generative modeling.<n>We show how these representations can drive action-conditioned world models, achieving a 97% planning success rate on the Two Rooms navigation task.
arXiv Detail & Related papers (2026-02-03T14:56:24Z) - LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics [53.247652209132376]
Joint-Embedding Predictive Architectures (JEPAs) offer a promising blueprint, but lack of practical guidance and theory has led to ad-hoc R&D.<n>We present a comprehensive theory of JEPAs and instantiate it in bf LeJEPA, a lean, scalable, and theoretically grounded training objective.
arXiv Detail & Related papers (2025-11-11T18:21:55Z) - Joint Embeddings Go Temporal [5.2741154046624255]
JointEmbedding Predictive Architectures (JEPA) has been introduced with the aim to perform self-supervised learning in the latent space.<n>Time Series JEPA (TS-JEPA) is an architecture specifically adapted for time series representation learning.<n>We show that TS-JEPA can match or surpass current state-of-the-art baselines on different standard datasets.
arXiv Detail & Related papers (2025-09-29T19:57:37Z) - SparseJEPA: Sparse Representation Learning of Joint Embedding Predictive Architectures [0.46040036610482665]
Joint Embedding Predictive Architectures (JEPA) have emerged as a powerful framework for learning general-purpose representations.
We propose SparseJEPA, an extension that integrates sparse representation learning into the JEPA framework to enhance the quality of learned representations.
arXiv Detail & Related papers (2025-04-22T02:43:00Z) - ACT-JEPA: Novel Joint-Embedding Predictive Architecture for Efficient Policy Representation Learning [90.41852663775086]
ACT-JEPA is a novel architecture that integrates imitation learning and self-supervised learning.<n>We train a policy to predict action sequences and abstract observation sequences.<n>Our experiments show that ACT-JEPA improves the quality of representations by learning temporal environment dynamics.
arXiv Detail & Related papers (2025-01-24T16:41:41Z) - Connecting Joint-Embedding Predictive Architecture with Contrastive Self-supervised Learning [14.869908713261227]
Contrastive-JEPA integrates the Image-based Joint-Embedding Predictive Architecture with the Variance-Invariance-Covariance Regularization (VICReg) strategy.
C-JEPA significantly enhances the stability and quality of visual representation learning.
When pre-trained on the ImageNet-1K dataset, C-JEPA exhibits rapid and improved convergence in both linear probing and fine-tuning performance metrics.
arXiv Detail & Related papers (2024-10-25T13:48:12Z) - Joint-Embedding Predictive Architecture for Self-Supervised Learning of Mask Classification Architecture [5.872289712903129]
We introduce Mask-JEPA, a self-supervised learning framework tailored for mask classification architectures (MCA)
Mask-JEPA combines a Joint Embedding Predictive Architecture with MCA to adeptly capture intricate semantics and precise object boundaries.
Our approach addresses two critical challenges in self-supervised learning: 1) extracting comprehensive representations for universal image segmentation from a pixel decoder, and 2) effectively training the transformer decoder.
arXiv Detail & Related papers (2024-07-15T14:01:03Z) - Jointly Training and Pruning CNNs via Learnable Agent Guidance and Alignment [69.33930972652594]
We propose a novel structural pruning approach to jointly learn the weights and structurally prune architectures of CNN models.
The core element of our method is a Reinforcement Learning (RL) agent whose actions determine the pruning ratios of the CNN model's layers.
We conduct the joint training and pruning by iteratively training the model's weights and the agent's policy.
arXiv Detail & Related papers (2024-03-28T15:22:29Z) - Graph-level Representation Learning with Joint-Embedding Predictive Architectures [43.89120279424267]
Joint-Embedding Predictive Architectures (JEPAs) have emerged as a novel and powerful technique for self-supervised representation learning.<n>We show that graph-level representations can be effectively modeled using this paradigm by proposing a Graph Joint-Embedding Predictive Architecture (Graph-JEPA)<n>In particular, we employ masked modeling and focus on predicting the latent representations of masked subgraphs starting from the latent representation of a context subgraph.
arXiv Detail & Related papers (2023-09-27T20:42:02Z) - Self-Supervised Learning from Images with a Joint-Embedding Predictive
Architecture [43.83887661156133]
This paper demonstrates an approach for learning highly semantic image representations without relying on hand-crafted data-augmentations.
We introduce the Image-based Joint-Embedding Predictive Architecture (I-JEPA), a non-generative approach for self-supervised learning from images.
arXiv Detail & Related papers (2023-01-19T18:59:01Z) - Joint Embedding Predictive Architectures Focus on Slow Features [56.393060086442006]
Joint Embedding Predictive Architectures (JEPA) offer a reconstruction-free alternative.
We analyze performance of JEPA trained with VICReg and SimCLR objectives in the fully offline setting without access to rewards.
We find that JEPA methods perform on par or better than reconstruction when distractor noise changes every time step, but fail when the noise is fixed.
arXiv Detail & Related papers (2022-11-20T00:50:11Z) - Modeling Image Composition for Complex Scene Generation [77.10533862854706]
We present a method that achieves state-of-the-art results on layout-to-image generation tasks.
After compressing RGB images into patch tokens, we propose the Transformer with Focal Attention (TwFA) for exploring dependencies of object-to-object, object-to-patch and patch-to-patch.
arXiv Detail & Related papers (2022-06-02T08:34:25Z) - DAFormer: Improving Network Architectures and Training Strategies for
Domain-Adaptive Semantic Segmentation [99.88539409432916]
We study the unsupervised domain adaptation (UDA) process.
We propose a novel UDA method, DAFormer, based on the benchmark results.
DAFormer significantly improves the state-of-the-art performance by 10.8 mIoU for GTA->Cityscapes and 5.4 mIoU for Synthia->Cityscapes.
arXiv Detail & Related papers (2021-11-29T19:00:46Z) - Retinex-inspired Unrolling with Cooperative Prior Architecture Search
for Low-light Image Enhancement [58.72667941107544]
We propose Retinex-inspired Unrolling with Architecture Search (RUAS) to construct lightweight yet effective enhancement network for low-light images.
RUAS is able to obtain a top-performing image enhancement network, which is with fast speed and requires few computational resources.
arXiv Detail & Related papers (2020-12-10T11:51:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.