EquiCaps: Predictor-Free Pose-Aware Pre-Trained Capsule Networks
- URL: http://arxiv.org/abs/2506.09895v1
- Date: Wed, 11 Jun 2025 16:07:58 GMT
- Title: EquiCaps: Predictor-Free Pose-Aware Pre-Trained Capsule Networks
- Authors: Athinoulla Konstantinou, Georgios Leontidis, Mamatha Thota, Aiden Durrant,
- Abstract summary: We introduce EquiCaps, a capsule-based approach to pose-aware self-supervision.<n>We leverage the intrinsic pose-awareness capabilities of capsules to improve performance in pose estimation tasks.<n>We also introduce 3DIEBench-T, an extension of a 3D object-rendering benchmark dataset.
- Score: 4.424836140281847
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Learning self-supervised representations that are invariant and equivariant to transformations is crucial for advancing beyond traditional visual classification tasks. However, many methods rely on predictor architectures to encode equivariance, despite evidence that architectural choices, such as capsule networks, inherently excel at learning interpretable pose-aware representations. To explore this, we introduce EquiCaps (Equivariant Capsule Network), a capsule-based approach to pose-aware self-supervision that eliminates the need for a specialised predictor for enforcing equivariance. Instead, we leverage the intrinsic pose-awareness capabilities of capsules to improve performance in pose estimation tasks. To further challenge our assumptions, we increase task complexity via multi-geometric transformations to enable a more thorough evaluation of invariance and equivariance by introducing 3DIEBench-T, an extension of a 3D object-rendering benchmark dataset. Empirical results demonstrate that EquiCaps outperforms prior state-of-the-art equivariant methods on rotation prediction, achieving a supervised-level $R^2$ of 0.78 on the 3DIEBench rotation prediction benchmark and improving upon SIE and CapsIE by 0.05 and 0.04 $R^2$, respectively. Moreover, in contrast to non-capsule-based equivariant approaches, EquiCaps maintains robust equivariant performance under combined geometric transformations, underscoring its generalisation capabilities and the promise of predictor-free capsule architectures.
Related papers
- seq-JEPA: Autoregressive Predictive Learning of Invariant-Equivariant World Models [1.474723404975345]
We propose seq-JEPA, a world modeling framework that introduces architectural biases into joint-embedding predictive architectures.<n>Seq-JEPA simultaneously learns two architecturally segregated representations: one equivariant to specified transformations and another invariant to them.<n>It excels at tasks that inherently require aggregating a sequence of observations, such as path integration across actions and predictive learning across eye movements.
arXiv Detail & Related papers (2025-05-06T04:39:11Z) - $SE(3)$ Equivariant Ray Embeddings for Implicit Multi-View Depth Estimation [45.26823569257832]
In this paper, we explore the application of equivariant multi-view learning to depth estimation.
We employ Spherical Harmonics for positional encoding to ensure 3D rotation equivariance.
We develop a specialized equivariant encoder and decoder within the Perceiver IO architecture.
arXiv Detail & Related papers (2024-11-11T19:34:47Z) - PseudoNeg-MAE: Self-Supervised Point Cloud Learning using Conditional Pseudo-Negative Embeddings [55.55445978692678]
PseudoNeg-MAE enhances global feature representation of point cloud masked autoencoders by making them both discriminative and sensitive to transformations.<n>We propose a novel loss that explicitly penalizes invariant collapse, enabling the network to capture richer transformation cues while preserving discriminative representations.
arXiv Detail & Related papers (2024-09-24T07:57:21Z) - Capsule Network Projectors are Equivariant and Invariant Learners [4.909818180516128]
In this work, we propose an invariant-equivariant self-supervised architecture that employs Capsule Networks (CapsNets)
We demonstrate that the use of CapsNets in equivariant self-supervised architectures achieves improved downstream performance.
This approach which we name CapsIE (Capsule Invariant Equivariant Network) achieves state-of-the-art performance on the equivariant rotation tasks.
arXiv Detail & Related papers (2024-05-23T10:04:23Z) - Equivariant Spatio-Temporal Self-Supervision for LiDAR Object Detection [37.142470149311904]
We propose atemporal equivariant learning framework by considering both spatial and temporal augmentations jointly.
We show our pre-training method for 3D object detection which outperforms existing equivariant and invariant approaches in many settings.
arXiv Detail & Related papers (2024-04-17T20:41:49Z) - Equivariant Adaptation of Large Pretrained Models [20.687626756753563]
We show that a canonicalization network can effectively be used to make a large pretrained network equivariant.
Using dataset-dependent priors to inform the canonicalization function, we are able to make large pretrained models equivariant while maintaining their performance.
arXiv Detail & Related papers (2023-10-02T21:21:28Z) - Self-Supervised Learning for Group Equivariant Neural Networks [75.62232699377877]
Group equivariant neural networks are the models whose structure is restricted to commute with the transformations on the input.
We propose two concepts for self-supervised tasks: equivariant pretext labels and invariant contrastive loss.
Experiments on standard image recognition benchmarks demonstrate that the equivariant neural networks exploit the proposed self-supervised tasks.
arXiv Detail & Related papers (2023-03-08T08:11:26Z) - Equivariance with Learned Canonicalization Functions [77.32483958400282]
We show that learning a small neural network to perform canonicalization is better than using predefineds.
Our experiments show that learning the canonicalization function is competitive with existing techniques for learning equivariant functions across many tasks.
arXiv Detail & Related papers (2022-11-11T21:58:15Z) - Regularizing Variational Autoencoder with Diversity and Uncertainty
Awareness [61.827054365139645]
Variational Autoencoder (VAE) approximates the posterior of latent variables based on amortized variational inference.
We propose an alternative model, DU-VAE, for learning a more Diverse and less Uncertain latent space.
arXiv Detail & Related papers (2021-10-24T07:58:13Z) - Topographic VAEs learn Equivariant Capsules [84.33745072274942]
We introduce the Topographic VAE: a novel method for efficiently training deep generative models with topographically organized latent variables.
We show that such a model indeed learns to organize its activations according to salient characteristics such as digit class, width, and style on MNIST.
We demonstrate approximate equivariance to complex transformations, expanding upon the capabilities of existing group equivariant neural networks.
arXiv Detail & Related papers (2021-09-03T09:25:57Z) - Self-Supervised 3D Hand Pose Estimation from monocular RGB via
Contrastive Learning [50.007445752513625]
We propose a new self-supervised method for the structured regression task of 3D hand pose estimation.
We experimentally investigate the impact of invariant and equivariant contrastive objectives.
We show that a standard ResNet-152, trained on additional unlabeled data, attains an improvement of $7.6%$ in PA-EPE on FreiHAND.
arXiv Detail & Related papers (2021-06-10T17:48:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.