Related papers: Joint Embedding Predictive Architectures Focus on Slow Features

Joint Embedding Predictive Architectures Focus on Slow Features

URL: http://arxiv.org/abs/2211.10831v1
Date: Sun, 20 Nov 2022 00:50:11 GMT
Title: Joint Embedding Predictive Architectures Focus on Slow Features
Authors: Vlad Sobal, Jyothir S V, Siddhartha Jalagam, Nicolas Carion, Kyunghyun Cho, Yann LeCun
Abstract summary: Joint Embedding Predictive Architectures (JEPA) offer a reconstruction-free alternative. We analyze performance of JEPA trained with VICReg and SimCLR objectives in the fully offline setting without access to rewards. We find that JEPA methods perform on par or better than reconstruction when distractor noise changes every time step, but fail when the noise is fixed.
Score: 56.393060086442006
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Many common methods for learning a world model for pixel-based environments use generative architectures trained with pixel-level reconstruction objectives. Recently proposed Joint Embedding Predictive Architectures (JEPA) offer a reconstruction-free alternative. In this work, we analyze performance of JEPA trained with VICReg and SimCLR objectives in the fully offline setting without access to rewards, and compare the results to the performance of the generative architecture. We test the methods in a simple environment with a moving dot with various background distractors, and probe learned representations for the dot's location. We find that JEPA methods perform on par or better than reconstruction when distractor noise changes every time step, but fail when the noise is fixed. Furthermore, we provide a theoretical explanation for the poor performance of JEPA-based methods with fixed noise, highlighting an important limitation.

Related papers

LensNet: An End-to-End Learning Framework for Empirical Point Spread Function Modeling and Lensless Imaging Reconstruction [32.85180149439811]
Lensless imaging stands out as a promising alternative to conventional lens-based systems.<n>Traditional lensless techniques often require explicit calibrations and extensive pre-processing.<n>We propose LensNet, an end-to-end deep learning framework that integrates spatial-domain and frequency-domain representations.
arXiv Detail & Related papers (2025-05-03T09:11:52Z)
JEPA for RL: Investigating Joint-Embedding Predictive Architectures for Reinforcement Learning [4.862490782515929]
We show how to adapt the Joint-Embedding Predictive Architecture to reinforcement learning from images. We discuss model collapse, show how to prevent it, and provide exemplary data on the classical Cart Pole task.
arXiv Detail & Related papers (2025-04-23T10:16:12Z)
DGTR: Distributed Gaussian Turbo-Reconstruction for Sparse-View Vast Scenes [81.56206845824572]
Novel-view synthesis (NVS) approaches play a critical role in vast scene reconstruction. Few-shot methods often struggle with poor reconstruction quality in vast environments. This paper presents DGTR, a novel distributed framework for efficient Gaussian reconstruction for sparse-view vast scenes.
arXiv Detail & Related papers (2024-11-19T07:51:44Z)
Outdoor Environment Reconstruction with Deep Learning on Radio Propagation Paths [5.030571576007511]
This paper proposes a novel approach harnessing ambient wireless signals for outdoor environment reconstruction. By analyzing radio frequency (RF) data, the paper aims to deduce the environmental characteristics and digitally reconstruct the outdoor surroundings. Two DL-driven approaches are evaluated, with performance assessed using metrics like intersection-over-union (IoU), Hausdorff distance, and Chamfer distance.
arXiv Detail & Related papers (2024-02-27T09:11:10Z)
DGNet: Dynamic Gradient-Guided Network for Water-Related Optics Image Enhancement [77.0360085530701]
Underwater image enhancement (UIE) is a challenging task due to the complex degradation caused by underwater environments. Previous methods often idealize the degradation process, and neglect the impact of medium noise and object motion on the distribution of image features. Our approach utilizes predicted images to dynamically update pseudo-labels, adding a dynamic gradient to optimize the network's gradient space.
arXiv Detail & Related papers (2023-12-12T06:07:21Z)
Few-Shot Audio-Visual Learning of Environment Acoustics [89.16560042178523]
Room impulse response (RIR) functions capture how the surrounding physical environment transforms the sounds heard by a listener. We explore how to infer RIRs based on a sparse set of images and echoes observed in the space. In experiments using a state-of-the-art audio-visual simulator for 3D environments, we demonstrate that our method successfully generates arbitrary RIRs.
arXiv Detail & Related papers (2022-06-08T16:38:24Z)
Neighborhood-Aware Neural Architecture Search [43.87465987957761]
We propose a novel neural architecture search (NAS) method to identify flat-minima architectures in the search space. Our formulation takes the "flatness" of an architecture into account by aggregating the performance over the neighborhood of this architecture. Based on our formulation, we propose neighborhood-aware random search (NA-RS) and neighborhood-aware differentiable architecture search (NA-DARTS)
arXiv Detail & Related papers (2021-05-13T15:56:52Z)
Retinex-inspired Unrolling with Cooperative Prior Architecture Search for Low-light Image Enhancement [58.72667941107544]
We propose Retinex-inspired Unrolling with Architecture Search (RUAS) to construct lightweight yet effective enhancement network for low-light images. RUAS is able to obtain a top-performing image enhancement network, which is with fast speed and requires few computational resources.
arXiv Detail & Related papers (2020-12-10T11:51:23Z)
Monocular Real-Time Volumetric Performance Capture [28.481131687883256]
We present the first approach to volumetric performance capture and novel-view rendering at real-time speed from monocular video. Our system reconstructs a fully textured 3D human from each frame by leveraging Pixel-Aligned Implicit Function (PIFu) We also introduce an Online Hard Example Mining (OHEM) technique that effectively suppresses failure modes due to the rare occurrence of challenging examples.
arXiv Detail & Related papers (2020-07-28T04:45:13Z)
Image Matching across Wide Baselines: From Paper to Practice [80.9424750998559]
We introduce a comprehensive benchmark for local features and robust estimation algorithms. Our pipeline's modular structure allows easy integration, configuration, and combination of different methods. We show that with proper settings, classical solutions may still outperform the perceived state of the art.
arXiv Detail & Related papers (2020-03-03T15:20:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.