Understanding when spatial transformer networks do not support
invariance, and what to do about it
- URL: http://arxiv.org/abs/2004.11678v5
- Date: Tue, 18 May 2021 09:14:59 GMT
- Title: Understanding when spatial transformer networks do not support
invariance, and what to do about it
- Authors: Lukas Finnveden, Ylva Jansson and Tony Lindeberg
- Abstract summary: spatial transformer networks (STNs) were designed to enable convolutional neural networks (CNNs) to learn invariance to image transformations.
We show that STNs do not have the ability to align the feature maps of a transformed image with those of its original.
We investigate alternative STN architectures that make use of complex features.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Spatial transformer networks (STNs) were designed to enable convolutional
neural networks (CNNs) to learn invariance to image transformations. STNs were
originally proposed to transform CNN feature maps as well as input images. This
enables the use of more complex features when predicting transformation
parameters. However, since STNs perform a purely spatial transformation, they
do not, in the general case, have the ability to align the feature maps of a
transformed image with those of its original. STNs are therefore unable to
support invariance when transforming CNN feature maps. We present a simple
proof for this and study the practical implications, showing that this
inability is coupled with decreased classification accuracy. We therefore
investigate alternative STN architectures that make use of complex features. We
find that while deeper localization networks are difficult to train,
localization networks that share parameters with the classification network
remain stable as they grow deeper, which allows for higher classification
accuracy on difficult datasets. Finally, we explore the interaction between
localization network complexity and iterative image alignment.
Related papers
- Revisiting Data Augmentation for Rotational Invariance in Convolutional
Neural Networks [0.29127054707887967]
We investigate how best to include rotational invariance in a CNN for image classification.
Our experiments show that networks trained with data augmentation alone can classify rotated images nearly as well as in the normal unrotated case.
arXiv Detail & Related papers (2023-10-12T15:53:24Z) - Distance Weighted Trans Network for Image Completion [52.318730994423106]
We propose a new architecture that relies on Distance-based Weighted Transformer (DWT) to better understand the relationships between an image's components.
CNNs are used to augment the local texture information of coarse priors.
DWT blocks are used to recover certain coarse textures and coherent visual structures.
arXiv Detail & Related papers (2023-10-11T12:46:11Z) - Entropy Transformer Networks: A Learning Approach via Tangent Bundle
Data Manifold [8.893886200299228]
This paper focuses on an accurate and fast approach for image transformation employed in the design of CNN architectures.
A novel Entropy STN (ESTN) is proposed that interpolates on the data manifold distributions.
Experiments on challenging benchmarks show that the proposed ESTN can improve predictive accuracy over a range of computer vision tasks.
arXiv Detail & Related papers (2023-07-24T04:21:51Z) - B-cos Networks: Alignment is All We Need for Interpretability [136.27303006772294]
We present a new direction for increasing the interpretability of deep neural networks (DNNs) by promoting weight-input alignment during training.
A B-cos transform induces a single linear transform that faithfully summarises the full model computations.
We show that it can easily be integrated into common models such as VGGs, ResNets, InceptionNets, and DenseNets.
arXiv Detail & Related papers (2022-05-20T16:03:29Z) - Revisiting Transformation Invariant Geometric Deep Learning: Are Initial
Representations All You Need? [80.86819657126041]
We show that transformation-invariant and distance-preserving initial representations are sufficient to achieve transformation invariance.
Specifically, we realize transformation-invariant and distance-preserving initial point representations by modifying multi-dimensional scaling.
We prove that TinvNN can strictly guarantee transformation invariance, being general and flexible enough to be combined with the existing neural networks.
arXiv Detail & Related papers (2021-12-23T03:52:33Z) - Implicit Equivariance in Convolutional Networks [1.911678487931003]
Implicitly Equivariant Networks (IEN) induce equivariant in the different layers of a standard CNN model.
We show IEN outperforms the state-of-the-art rotation equivariant tracking method while providing faster inference speed.
arXiv Detail & Related papers (2021-11-28T14:44:17Z) - CoTr: Efficiently Bridging CNN and Transformer for 3D Medical Image
Segmentation [95.51455777713092]
Convolutional neural networks (CNNs) have been the de facto standard for nowadays 3D medical image segmentation.
We propose a novel framework that efficiently bridges a bf Convolutional neural network and a bf Transformer bf (CoTr) for accurate 3D medical image segmentation.
arXiv Detail & Related papers (2021-03-04T13:34:22Z) - Rotation-Invariant Gait Identification with Quaternion Convolutional
Neural Networks [7.638280076041963]
We introduce Quaternion CNN, a network architecture which is intrinsically layer-wise equivariant and globally invariant under 3D rotations.
We show empirically that this network indeed significantly outperforms a traditional CNN in a multi-user rotation-invariant gait classification setting.
arXiv Detail & Related papers (2020-08-04T23:22:12Z) - Volumetric Transformer Networks [88.85542905676712]
We introduce a learnable module, the volumetric transformer network (VTN)
VTN predicts channel-wise warping fields so as to reconfigure intermediate CNN features spatially and channel-wisely.
Our experiments show that VTN consistently boosts the features' representation power and consequently the networks' accuracy on fine-grained image recognition and instance-level image retrieval.
arXiv Detail & Related papers (2020-07-18T14:00:12Z) - Computational optimization of convolutional neural networks using
separated filters architecture [69.73393478582027]
We consider a convolutional neural network transformation that reduces computation complexity and thus speedups neural network processing.
Use of convolutional neural networks (CNN) is the standard approach to image recognition despite the fact they can be too computationally demanding.
arXiv Detail & Related papers (2020-02-18T17:42:13Z) - The problems with using STNs to align CNN feature maps [0.0]
We argue that spatial transformer networks (STNs) do not have the ability to align the feature maps of a transformed image and its original.
We advocate taking advantage of more complex features in deeper layers by instead sharing parameters between the classification and the localisation network.
arXiv Detail & Related papers (2020-01-14T12:59:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.