Geometrically Constrained and Token-Based Probabilistic Spatial Transformers
- URL: http://arxiv.org/abs/2509.11218v1
- Date: Sun, 14 Sep 2025 11:30:53 GMT
- Title: Geometrically Constrained and Token-Based Probabilistic Spatial Transformers
- Authors: Johann Schmidt, Sebastian Stober,
- Abstract summary: We revisit Spatial Transformer Networks (STNs) as a canonicalization tool for transformer-based vision pipelines.<n>We propose a probabilistic, component-wise extension that improves robustness.<n> Experiments on challenging moth classification benchmarks demonstrate that our method consistently improves robustness compared to other STNs.
- Score: 5.437226012505534
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Fine-grained visual classification (FGVC) remains highly sensitive to geometric variability, where objects appear under arbitrary orientations, scales, and perspective distortions. While equivariant architectures address this issue, they typically require substantial computational resources and restrict the hypothesis space. We revisit Spatial Transformer Networks (STNs) as a canonicalization tool for transformer-based vision pipelines, emphasizing their flexibility, backbone-agnostic nature, and lack of architectural constraints. We propose a probabilistic, component-wise extension that improves robustness. Specifically, we decompose affine transformations into rotation, scaling, and shearing, and regress each component under geometric constraints using a shared localization encoder. To capture uncertainty, we model each component with a Gaussian variational posterior and perform sampling-based canonicalization during inference.A novel component-wise alignment loss leverages augmentation parameters to guide spatial alignment. Experiments on challenging moth classification benchmarks demonstrate that our method consistently improves robustness compared to other STNs.
Related papers
- Beyond Optimization: Intelligence as Metric-Topology Factorization under Geometric Incompleteness [6.0044467881527614]
We argue that intelligence is not navigation through a fixed maze, but the ability to reshape representational geometry so desired behaviors become stable attractors.<n>We show any fixed metric is geometrically incomplete: for any local metric representation, some topological transformations make it singular or incoherent.<n>We introduce the Topological Urysohn Machine (TUM), implementing MTF through memory-amortized metric inference.
arXiv Detail & Related papers (2026-02-08T13:59:22Z) - SONIC: Spectral Oriented Neural Invariant Convolutions [0.0]
Convolutional Neural Networks (CNNs) rely on fixed-size kernels scanning local patches.<n>ViTs provide global connectivity but lack spatial inductive bias, depend on explicit positional encodings, and remain tied to the initial patch size.<n>We introduce SONIC, a continuous spectral parameterisation that models convolutional operators using a small set of shared, orientation-selective components.
arXiv Detail & Related papers (2026-01-27T18:51:11Z) - Equivariant-Aware Structured Pruning for Efficient Edge Deployment: A Comprehensive Framework with Adaptive Fine-Tuning [0.0]
We present a framework combining group equivariant convolutional neural networks (G-CNNs) with equivariant-aware structured pruning.<n>Our approach preserves equivariant properties by analyzing e2cnn layer structure and applying neuron-level pruning to fully connected components.<n>We evaluate our method on satellite imagery (EuroSAT) and standard benchmarks (CIFAR-10, Rotated MNIST) demonstrating effectiveness across diverse domains.
arXiv Detail & Related papers (2025-11-21T13:41:47Z) - Robust Canonicalization through Bootstrapped Data Re-Alignment [5.437226012505534]
Fine-grained visual classification tasks, such as insect and bird identification, demand sensitivity to subtle visual cues.<n>We propose a bootstrapping algorithm that iteratively re-aligns training samples by reducing variance.<n>We show that our method consistently outperforms equivariant, and canonicalization baselines while performing on par with augmentation.
arXiv Detail & Related papers (2025-10-09T13:05:20Z) - Generalized Linear Mode Connectivity for Transformers [87.32299363530996]
A striking phenomenon is linear mode connectivity (LMC), where independently trained models can be connected by low- or zero-loss paths.<n>Prior work has predominantly focused on neuron re-ordering through permutations, but such approaches are limited in scope.<n>We introduce a unified framework that captures four symmetry classes: permutations, semi-permutations, transformations, and general invertible maps.<n>This generalization enables, for the first time, the discovery of low- and zero-barrier linear paths between independently trained Vision Transformers and GPT-2 models.
arXiv Detail & Related papers (2025-06-28T01:46:36Z) - Low-Rank Tensor Recovery via Variational Schatten-p Quasi-Norm and Jacobian Regularization [49.85875869048434]
We propose a CP-based low-rank tensor function parameterized by neural networks for implicit neural representation.<n>To achieve sparser CP decomposition, we introduce a variational Schatten-p quasi-norm to prune redundant rank-1 components.<n>For smoothness, we propose a regularization term based on the spectral norm of the Jacobian and Hutchinson's trace estimator.
arXiv Detail & Related papers (2025-06-27T11:23:10Z) - CP$^2$: Leveraging Geometry for Conformal Prediction via Canonicalization [51.716834831684004]
We study the problem of conformal prediction (CP) under geometric data shifts.<n>We propose integrating geometric information--such as geometric pose--into the conformal procedure to reinstate its guarantees.
arXiv Detail & Related papers (2025-06-19T10:12:02Z) - Geometry-Informed Neural Operator Transformer [0.8906214436849201]
This work introduces the Geometry-Informed Neural Operator Transformer (GINOT), which integrates the transformer architecture with the neural operator framework to enable forward predictions on arbitrary geometries.<n>The performance of GINOT is validated on multiple challenging datasets, showcasing its high accuracy and strong generalization capabilities for complex and arbitrary 2D and 3D geometries.
arXiv Detail & Related papers (2025-04-28T03:39:27Z) - PseudoNeg-MAE: Self-Supervised Point Cloud Learning using Conditional Pseudo-Negative Embeddings [55.55445978692678]
PseudoNeg-MAE enhances global feature representation of point cloud masked autoencoders by making them both discriminative and sensitive to transformations.<n>We propose a novel loss that explicitly penalizes invariant collapse, enabling the network to capture richer transformation cues while preserving discriminative representations.
arXiv Detail & Related papers (2024-09-24T07:57:21Z) - Relative Representations: Topological and Geometric Perspectives [50.85040046976025]
Relative representations are an established approach to zero-shot model stitching.<n>We introduce a normalization procedure in the relative transformation, resulting in invariance to non-isotropic rescalings and permutations.<n>Second, we propose to deploy topological densification when fine-tuning relative representations, a topological regularization loss encouraging clustering within classes.
arXiv Detail & Related papers (2024-09-17T08:09:22Z) - EqNIO: Subequivariant Neural Inertial Odometry [33.96552018734359]
We show that IMU data transforms equivariantly, when rotated around the gravity vector and reflected with respect to arbitrary planes parallel to gravity.
We then map the IMU data into this frame, thereby achieving an invariant canonicalization that can be directly used with off-the-shelf inertial odometry networks.
arXiv Detail & Related papers (2024-08-12T17:42:46Z) - Optimization Dynamics of Equivariant and Augmented Neural Networks [2.7918308693131135]
We investigate the optimization of neural networks on symmetric data.
We compare the strategy of constraining the architecture to be equivariant to that of using data augmentation.
Our analysis reveals that even in the latter situation, stationary points may be unstable for augmented training although they are stable for the manifestly equivariant models.
arXiv Detail & Related papers (2023-03-23T17:26:12Z) - Rotation-Invariant Transformer for Point Cloud Matching [42.5714375149213]
We introduce RoITr, a Rotation-Invariant Transformer to cope with the pose variations in the point cloud matching task.
We propose a global transformer with rotation-invariant cross-frame spatial awareness learned by the self-attention mechanism.
RoITr surpasses the existing methods by at least 13 and 5 percentage points in terms of Inlier Ratio and Registration Recall.
arXiv Detail & Related papers (2023-03-14T20:55:27Z) - Improving the Sample-Complexity of Deep Classification Networks with
Invariant Integration [77.99182201815763]
Leveraging prior knowledge on intraclass variance due to transformations is a powerful method to improve the sample complexity of deep neural networks.
We propose a novel monomial selection algorithm based on pruning methods to allow an application to more complex problems.
We demonstrate the improved sample complexity on the Rotated-MNIST, SVHN and CIFAR-10 datasets.
arXiv Detail & Related papers (2022-02-08T16:16:11Z) - Revisiting Transformation Invariant Geometric Deep Learning: Are Initial
Representations All You Need? [80.86819657126041]
We show that transformation-invariant and distance-preserving initial representations are sufficient to achieve transformation invariance.
Specifically, we realize transformation-invariant and distance-preserving initial point representations by modifying multi-dimensional scaling.
We prove that TinvNN can strictly guarantee transformation invariance, being general and flexible enough to be combined with the existing neural networks.
arXiv Detail & Related papers (2021-12-23T03:52:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.