Social-JEPA: Emergent Geometric Isomorphism
- URL: http://arxiv.org/abs/2603.02263v1
- Date: Sat, 28 Feb 2026 07:54:43 GMT
- Title: Social-JEPA: Emergent Geometric Isomorphism
- Authors: Haoran Zhang, Youjin Wang, Yi Duan, Rong Fu, Dianyu Zhao, Sicheng Fan, Shuaishuai Cao, Wentao Guo, Xiao Zhou,
- Abstract summary: World models compress rich sensory streams into compact latent codes that anticipate future observations.<n>We let separate agents acquire such models from distinct viewpoints of the same environment without any parameter sharing or coordination.<n>After training, their internal representations exhibit a striking emergent property: the two latent spaces are related by an approximate linear isometry.
- Score: 11.526381612918549
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: World models compress rich sensory streams into compact latent codes that anticipate future observations. We let separate agents acquire such models from distinct viewpoints of the same environment without any parameter sharing or coordination. After training, their internal representations exhibit a striking emergent property: the two latent spaces are related by an approximate linear isometry, enabling transparent translation between them. This geometric consensus survives large viewpoint shifts and scant overlap in raw pixels. Leveraging the learned alignment, a classifier trained on one agent can be ported to the other with no additional gradient steps, while distillation-like migration accelerates later learning and markedly reduces total compute. The findings reveal that predictive learning objectives impose strong regularities on representation geometry, suggesting a lightweight path to interoperability among decentralized vision systems. The code is available at https://anonymous.4open.science/r/Social-JEPA-5C57.
Related papers
- HeRO: Hierarchical 3D Semantic Representation for Pose-aware Object Manipulation [54.325346533275074]
HeRO is a diffusion-based policy that couples geometry and semantics via hierarchical semantic fields.<n>In various tests, HeRO establishes a new state-of-the-art, improving success on Place Dual Shoes by 12.3% and averaging 6.5% gains across six challenging pose-aware tasks.
arXiv Detail & Related papers (2026-02-21T12:29:10Z) - Two-Stream Interactive Joint Learning of Scene Parsing and Geometric Vision Tasks [24.19752468668527]
Two Interactive Streams (TwInS) is a novel bio-inspired joint learning framework capable of simultaneously performing scene parsing and geometric vision tasks.<n>To eliminate the dependence on costly human-annotated correspondence ground truth, TwInS is equipped with a tailored semi-supervised training strategy.
arXiv Detail & Related papers (2026-02-14T04:11:19Z) - Gauge-invariant representation holonomy [1.078600700827543]
Deep networks learn internal representations whose geometry--how features bend, rotate, and evolve--affects both generalization and robustness.<n>Existing similarity measures such as CKA or SVCCA capture pointwise overlap between activation sets, but miss how representations change along input paths.<n>We introduce representation holonomy, a gauge-invariant statistic that measures this path dependence.
arXiv Detail & Related papers (2026-01-29T12:51:17Z) - Joint Geometry-Appearance Human Reconstruction in a Unified Latent Space via Bridge Diffusion [57.09673862519791]
This paper introduces textbfJGA-LBD, a novel framework that unifies the modeling of geometry and appearance into a joint latent representation.<n> Experiments demonstrate that JGA-LBD outperforms current state-of-the-art approaches in terms of both geometry fidelity and appearance quality.
arXiv Detail & Related papers (2026-01-01T12:48:56Z) - Aligned Novel View Image and Geometry Synthesis via Cross-modal Attention Instillation [62.87088388345378]
We introduce a diffusion-based framework that performs aligned novel view image and geometry generation via a warping-and-inpainting methodology.<n>Method leverages off-the-shelf geometry predictors to predict partial geometries viewed from reference images.<n>Cross-modal attention distillation is proposed to ensure accurate alignment between generated images and geometry.
arXiv Detail & Related papers (2025-06-13T16:19:00Z) - Learning Abstract World Models with a Group-Structured Latent Space [12.685414866379366]
We show how geometric priors can be imposed on the representation manifold of a learned transition model.<n>We show experimentally that this leads to better predictions of the latent transition model than fully unstructured approaches.
arXiv Detail & Related papers (2025-06-02T10:43:18Z) - Geometry-Editable and Appearance-Preserving Object Compositon [67.98806888489385]
General object composition (GOC) aims to seamlessly integrate a target object into a background scene with desired geometric properties.<n>Recent approaches derive semantic embeddings and integrate them into advanced diffusion models to enable geometry-editable generation.<n>We introduce a Disentangled Geometry-editable and Appearance-preserving Diffusion model that first leverages semantic embeddings to implicitly capture desired geometric transformations.
arXiv Detail & Related papers (2025-05-27T09:05:28Z) - Neural Isometries: Taming Transformations for Equivariant ML [8.203292895010748]
We introduce Neural Isometries, an autoencoder framework which learns to map the observation space to a general-purpose latent space.
We show that a simple off-the-shelf equivariant network operating in the pre-trained latent space can achieve results on par with meticulously-engineered, handcrafted networks.
arXiv Detail & Related papers (2024-05-29T17:24:25Z) - Understanding and Mitigating Hyperbolic Dimensional Collapse in Graph Contrastive Learning [70.0681902472251]
We propose a novel contrastive learning framework to learn high-quality graph embeddings in hyperbolic space.<n>Specifically, we design the alignment metric that effectively captures the hierarchical data-invariant information.<n>We show that in the hyperbolic space one has to address the leaf- and height-level uniformity related to properties of trees.
arXiv Detail & Related papers (2023-10-27T15:31:42Z) - Semi-Supervised Manifold Learning with Complexity Decoupled Chart Autoencoders [45.29194877564103]
This work introduces a chart autoencoder with an asymmetric encoding-decoding process that can incorporate additional semi-supervised information such as class labels.
We discuss the approximation power of such networks and derive a bound that essentially depends on the intrinsic dimension of the data manifold rather than the dimension of ambient space.
arXiv Detail & Related papers (2022-08-22T19:58:03Z) - Geometry Contrastive Learning on Heterogeneous Graphs [50.58523799455101]
This paper proposes a novel self-supervised learning method, termed as Geometry Contrastive Learning (GCL)
GCL views a heterogeneous graph from Euclidean and hyperbolic perspective simultaneously, aiming to make a strong merger of the ability of modeling rich semantics and complex structures.
Extensive experiments on four benchmarks data sets show that the proposed approach outperforms the strong baselines.
arXiv Detail & Related papers (2022-06-25T03:54:53Z) - Stochastic Coherence Over Attention Trajectory For Continuous Learning
In Video Streams [64.82800502603138]
This paper proposes a novel neural-network-based approach to progressively and autonomously develop pixel-wise representations in a video stream.
The proposed method is based on a human-like attention mechanism that allows the agent to learn by observing what is moving in the attended locations.
Our experiments leverage 3D virtual environments and they show that the proposed agents can learn to distinguish objects just by observing the video stream.
arXiv Detail & Related papers (2022-04-26T09:52:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.