Related papers: Inability of spatial transformations of CNN feature maps to support invariant recognition

Inability of spatial transformations of CNN feature maps to support invariant recognition

URL: http://arxiv.org/abs/2004.14716v1
Date: Thu, 30 Apr 2020 12:12:58 GMT
Title: Inability of spatial transformations of CNN feature maps to support invariant recognition
Authors: Ylva Jansson, Maksim Maydanskiy, Lukas Finnveden and Tony Lindeberg
Abstract summary: We show that spatial transformations of CNN feature maps cannot align the feature maps of a transformed image to match those of its original. For rotations and reflections, spatially transforming feature maps or filters can enable invariance but only for networks with learnt or hardcoded rotation- or reflection-invariant features.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A large number of deep learning architectures use spatial transformations of CNN feature maps or filters to better deal with variability in object appearance caused by natural image transformations. In this paper, we prove that spatial transformations of CNN feature maps cannot align the feature maps of a transformed image to match those of its original, for general affine transformations, unless the extracted features are themselves invariant. Our proof is based on elementary analysis for both the single- and multi-layer network case. The results imply that methods based on spatial transformations of CNN feature maps or filters cannot replace image alignment of the input and cannot enable invariant recognition for general affine transformations, specifically not for scaling transformations or shear transformations. For rotations and reflections, spatially transforming feature maps or filters can enable invariance but only for networks with learnt or hardcoded rotation- or reflection-invariant features

Related papers

Self-supervised Transformation Learning for Equivariant Representations [26.207358743969277]
Unsupervised representation learning has significantly advanced various machine learning tasks. We propose Self-supervised Transformation Learning (STL), replacing transformation labels with transformation representations derived from image pairs. We demonstrate the approach's effectiveness across diverse classification and detection tasks, outperforming existing methods in 7 out of 11 benchmarks.
arXiv Detail & Related papers (2025-01-15T10:54:21Z)
Learning Transformations To Reduce the Geometric Shift in Object Detection [60.20931827772482]
We tackle geometric shifts emerging from variations in the image capture process. We introduce a self-training approach that learns a set of geometric transformations to minimize these shifts. We evaluate our method on two different shifts, i.e., a camera's field of view (FoV) change and a viewpoint change.
arXiv Detail & Related papers (2023-01-13T11:55:30Z)
ParGAN: Learning Real Parametrizable Transformations [50.51405390150066]
We propose ParGAN, a generalization of the cycle-consistent GAN framework to learn image transformations. The proposed generator takes as input both an image and a parametrization of the transformation. We show how, with disjoint image domains with no annotated parametrization, our framework can create smooths as well as learn multiple transformations simultaneously.
arXiv Detail & Related papers (2022-11-09T16:16:06Z)
Learning Invariant Representations for Equivariant Neural Networks Using Orthogonal Moments [9.680414207552722]
The convolutional layers of standard convolutional neural networks (CNNs) are equivariant to translation. Recently, a new class of CNNs is proposed in which the conventional layers of CNNs are replaced with equivariant convolution, pooling, and batch-normalization layers.
arXiv Detail & Related papers (2022-09-22T11:48:39Z)
Improving the Sample-Complexity of Deep Classification Networks with Invariant Integration [77.99182201815763]
Leveraging prior knowledge on intraclass variance due to transformations is a powerful method to improve the sample complexity of deep neural networks. We propose a novel monomial selection algorithm based on pruning methods to allow an application to more complex problems. We demonstrate the improved sample complexity on the Rotated-MNIST, SVHN and CIFAR-10 datasets.
arXiv Detail & Related papers (2022-02-08T16:16:11Z)
Revisiting Transformation Invariant Geometric Deep Learning: Are Initial Representations All You Need? [80.86819657126041]
We show that transformation-invariant and distance-preserving initial representations are sufficient to achieve transformation invariance. Specifically, we realize transformation-invariant and distance-preserving initial point representations by modifying multi-dimensional scaling. We prove that TinvNN can strictly guarantee transformation invariance, being general and flexible enough to be combined with the existing neural networks.
arXiv Detail & Related papers (2021-12-23T03:52:33Z)
Quantised Transforming Auto-Encoders: Achieving Equivariance to Arbitrary Transformations in Deep Networks [23.673155102696338]
Convolutional Neural Networks (CNNs) are equivariant to image translation. We propose an auto-encoder architecture whose embedding obeys an arbitrary set of equivariance relations simultaneously. We demonstrate results of successful re-rendering of transformed versions of input images on several datasets.
arXiv Detail & Related papers (2021-11-25T02:26:38Z)
FILTRA: Rethinking Steerable CNN by Filter Transform [59.412570807426135]
The problem of steerable CNN has been studied from aspect of group representation theory. We show that kernel constructed by filter transform can also be interpreted in the group representation theory. This interpretation help complete the puzzle of steerable CNN theory and provides a novel and simple approach to implement steerable convolution operators.
arXiv Detail & Related papers (2021-05-25T03:32:34Z)
A generalised feature for low level vision [0.0]
The Sinclair-Town transform subsumes the rolls of both edge-detector, MSER style region detector and corner detector. The difference from the local mean is quantised to 3 values (dark-neutral-light)
arXiv Detail & Related papers (2021-02-03T11:02:03Z)
Invariant Deep Compressible Covariance Pooling for Aerial Scene Categorization [80.55951673479237]
We propose a novel invariant deep compressible covariance pooling (IDCCP) to solve nuisance variations in aerial scene categorization. We conduct extensive experiments on the publicly released aerial scene image data sets and demonstrate the superiority of this method compared with state-of-the-art methods.
arXiv Detail & Related papers (2020-11-11T11:13:07Z)
Understanding when spatial transformer networks do not support invariance, and what to do about it [0.0]
spatial transformer networks (STNs) were designed to enable convolutional neural networks (CNNs) to learn invariance to image transformations. We show that STNs do not have the ability to align the feature maps of a transformed image with those of its original. We investigate alternative STN architectures that make use of complex features.
arXiv Detail & Related papers (2020-04-24T12:20:35Z)
The problems with using STNs to align CNN feature maps [0.0]
We argue that spatial transformer networks (STNs) do not have the ability to align the feature maps of a transformed image and its original. We advocate taking advantage of more complex features in deeper layers by instead sharing parameters between the classification and the localisation network.
arXiv Detail & Related papers (2020-01-14T12:59:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.