Related papers: Understanding when spatial transformer networks do not support invariance, and what to do about it

Understanding when spatial transformer networks do not support invariance, and what to do about it

URL: http://arxiv.org/abs/2004.11678v5
Date: Tue, 18 May 2021 09:14:59 GMT
Title: Understanding when spatial transformer networks do not support invariance, and what to do about it
Authors: Lukas Finnveden, Ylva Jansson and Tony Lindeberg
Abstract summary: spatial transformer networks (STNs) were designed to enable convolutional neural networks (CNNs) to learn invariance to image transformations. We show that STNs do not have the ability to align the feature maps of a transformed image with those of its original. We investigate alternative STN architectures that make use of complex features.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Spatial transformer networks (STNs) were designed to enable convolutional neural networks (CNNs) to learn invariance to image transformations. STNs were originally proposed to transform CNN feature maps as well as input images. This enables the use of more complex features when predicting transformation parameters. However, since STNs perform a purely spatial transformation, they do not, in the general case, have the ability to align the feature maps of a transformed image with those of its original. STNs are therefore unable to support invariance when transforming CNN feature maps. We present a simple proof for this and study the practical implications, showing that this inability is coupled with decreased classification accuracy. We therefore investigate alternative STN architectures that make use of complex features. We find that while deeper localization networks are difficult to train, localization networks that share parameters with the classification network remain stable as they grow deeper, which allows for higher classification accuracy on difficult datasets. Finally, we explore the interaction between localization network complexity and iterative image alignment.

Related papers

Variable-size Symmetry-based Graph Fourier Transforms for image compression [65.7352685872625]
We propose a new family of Symmetry-based Graph Fourier Transforms of variable sizes into a coding framework. Our proposed algorithm generates symmetric graphs on the grid by adding specific symmetrical connections between nodes. Experiments show that SBGFTs outperform the primary transforms integrated in the explicit Multiple Transform Selection.
arXiv Detail & Related papers (2024-11-24T13:00:44Z)
Revisiting Data Augmentation for Rotational Invariance in Convolutional Neural Networks [0.29127054707887967]
We investigate how best to include rotational invariance in a CNN for image classification. Our experiments show that networks trained with data augmentation alone can classify rotated images nearly as well as in the normal unrotated case.
arXiv Detail & Related papers (2023-10-12T15:53:24Z)
Entropy Transformer Networks: A Learning Approach via Tangent Bundle Data Manifold [8.893886200299228]
This paper focuses on an accurate and fast approach for image transformation employed in the design of CNN architectures. A novel Entropy STN (ESTN) is proposed that interpolates on the data manifold distributions. Experiments on challenging benchmarks show that the proposed ESTN can improve predictive accuracy over a range of computer vision tasks.
arXiv Detail & Related papers (2023-07-24T04:21:51Z)
B-cos Networks: Alignment is All We Need for Interpretability [136.27303006772294]
We present a new direction for increasing the interpretability of deep neural networks (DNNs) by promoting weight-input alignment during training. A B-cos transform induces a single linear transform that faithfully summarises the full model computations. We show that it can easily be integrated into common models such as VGGs, ResNets, InceptionNets, and DenseNets.
arXiv Detail & Related papers (2022-05-20T16:03:29Z)
Revisiting Transformation Invariant Geometric Deep Learning: Are Initial Representations All You Need? [80.86819657126041]
We show that transformation-invariant and distance-preserving initial representations are sufficient to achieve transformation invariance. Specifically, we realize transformation-invariant and distance-preserving initial point representations by modifying multi-dimensional scaling. We prove that TinvNN can strictly guarantee transformation invariance, being general and flexible enough to be combined with the existing neural networks.
arXiv Detail & Related papers (2021-12-23T03:52:33Z)
Implicit Equivariance in Convolutional Networks [1.911678487931003]
Implicitly Equivariant Networks (IEN) induce equivariant in the different layers of a standard CNN model. We show IEN outperforms the state-of-the-art rotation equivariant tracking method while providing faster inference speed.
arXiv Detail & Related papers (2021-11-28T14:44:17Z)
CoTr: Efficiently Bridging CNN and Transformer for 3D Medical Image Segmentation [95.51455777713092]
Convolutional neural networks (CNNs) have been the de facto standard for nowadays 3D medical image segmentation. We propose a novel framework that efficiently bridges a bf Convolutional neural network and a bf Transformer bf (CoTr) for accurate 3D medical image segmentation.
arXiv Detail & Related papers (2021-03-04T13:34:22Z)
Rotation-Invariant Gait Identification with Quaternion Convolutional Neural Networks [7.638280076041963]
We introduce Quaternion CNN, a network architecture which is intrinsically layer-wise equivariant and globally invariant under 3D rotations. We show empirically that this network indeed significantly outperforms a traditional CNN in a multi-user rotation-invariant gait classification setting.
arXiv Detail & Related papers (2020-08-04T23:22:12Z)
Volumetric Transformer Networks [88.85542905676712]
We introduce a learnable module, the volumetric transformer network (VTN) VTN predicts channel-wise warping fields so as to reconfigure intermediate CNN features spatially and channel-wisely. Our experiments show that VTN consistently boosts the features' representation power and consequently the networks' accuracy on fine-grained image recognition and instance-level image retrieval.
arXiv Detail & Related papers (2020-07-18T14:00:12Z)
Computational optimization of convolutional neural networks using separated filters architecture [69.73393478582027]
We consider a convolutional neural network transformation that reduces computation complexity and thus speedups neural network processing. Use of convolutional neural networks (CNN) is the standard approach to image recognition despite the fact they can be too computationally demanding.
arXiv Detail & Related papers (2020-02-18T17:42:13Z)
The problems with using STNs to align CNN feature maps [0.0]
We argue that spatial transformer networks (STNs) do not have the ability to align the feature maps of a transformed image and its original. We advocate taking advantage of more complex features in deeper layers by instead sharing parameters between the classification and the localisation network.
arXiv Detail & Related papers (2020-01-14T12:59:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.