On Translation Invariance in CNNs: Convolutional Layers can Exploit
Absolute Spatial Location
- URL: http://arxiv.org/abs/2003.07064v2
- Date: Sat, 30 May 2020 14:59:07 GMT
- Title: On Translation Invariance in CNNs: Convolutional Layers can Exploit
Absolute Spatial Location
- Authors: Osman Semih Kayhan and Jan C. van Gemert
- Abstract summary: We show that CNNs can and will exploit the absolute spatial location by learning filters that respond exclusively to particular absolute locations.
Because modern CNNs filters have a huge receptive field, these boundary effects operate even far from the image boundary.
- Score: 18.932504899552494
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper we challenge the common assumption that convolutional layers in
modern CNNs are translation invariant. We show that CNNs can and will exploit
the absolute spatial location by learning filters that respond exclusively to
particular absolute locations by exploiting image boundary effects. Because
modern CNNs filters have a huge receptive field, these boundary effects operate
even far from the image boundary, allowing the network to exploit absolute
spatial location all over the image. We give a simple solution to remove
spatial location encoding which improves translation invariance and thus gives
a stronger visual inductive bias which particularly benefits small data sets.
We broadly demonstrate these benefits on several architectures and various
applications such as image classification, patch matching, and two video
classification datasets.
Related papers
- TransGeo: Transformer Is All You Need for Cross-view Image
Geo-localization [81.70547404891099]
CNN-based methods for cross-view image geo-localization fail to model global correlation.
We propose a pure transformer-based approach (TransGeo) to address these limitations.
TransGeo achieves state-of-the-art results on both urban and rural datasets.
arXiv Detail & Related papers (2022-03-31T21:19:41Z) - Interpretable Compositional Convolutional Neural Networks [20.726080433723922]
We propose a method to modify a traditional convolutional neural network (CNN) into an interpretable compositional CNN.
In a compositional CNN, each filter is supposed to consistently represent a specific compositional object part or image region with a clear meaning.
Our method can be broadly applied to different types of CNNs.
arXiv Detail & Related papers (2021-07-09T15:01:24Z) - The Spatially-Correlative Loss for Various Image Translation Tasks [69.62228639870114]
We propose a novel spatially-correlative loss that is simple, efficient and yet effective for preserving scene structure consistency.
Previous methods attempt this by using pixel-level cycle-consistency or feature-level matching losses.
We show distinct improvement over baseline models in all three modes of unpaired I2I translation: single-modal, multi-modal, and even single-image translation.
arXiv Detail & Related papers (2021-04-02T02:13:30Z) - The Mind's Eye: Visualizing Class-Agnostic Features of CNNs [92.39082696657874]
We propose an approach to visually interpret CNN features given a set of images by creating corresponding images that depict the most informative features of a specific layer.
Our method uses a dual-objective activation and distance loss, without requiring a generator network nor modifications to the original model.
arXiv Detail & Related papers (2021-01-29T07:46:39Z) - Position, Padding and Predictions: A Deeper Look at Position Information
in CNNs [30.583407443282365]
We show that a surprising degree of absolute position information is encoded in commonly used CNNs.
We show that zero padding drives CNNs to encode position information in their internal representations, while a lack of padding precludes position encoding.
This gives rise to deeper questions about the role of position information in CNNs.
arXiv Detail & Related papers (2021-01-28T23:40:32Z) - An Empirical Method to Quantify the Peripheral Performance Degradation
in Deep Networks [18.808132632482103]
convolutional neural network (CNN) kernels compound with each convolutional layer.
Deeper and deeper networks combined with stride-based down-sampling means that the propagation of this region can end up covering a non-negligable portion of the image.
Our dataset is constructed by inserting objects into high resolution backgrounds, thereby allowing us to crop sub-images which place target objects at specific locations relative to the image border.
By probing the behaviour of Mask R-CNN across a selection of target locations, we see clear patterns of performance degredation near the image boundary, and in particular in the image corners.
arXiv Detail & Related papers (2020-12-04T18:00:47Z) - What Does CNN Shift Invariance Look Like? A Visualization Study [87.79405274610681]
Feature extraction with convolutional neural networks (CNNs) is a popular method to represent images for machine learning tasks.
We focus on measuring and visualizing the shift invariance of extracted features from popular off-the-shelf CNN models.
We conclude that features extracted from popular networks are not globally invariant, and that biases and artifacts exist within this variance.
arXiv Detail & Related papers (2020-11-09T01:16:30Z) - RetinotopicNet: An Iterative Attention Mechanism Using Local Descriptors
with Global Context [0.0]
Convolutional Neural Networks (CNNs) were the driving force behind many advancements in Computer Vision research in recent years.
CNNs lack the property of scale and rotation invariance: two of the most frequently encountered transformations in natural images.
We develop an efficient solution by reproducing how nature has solved the problem in the human brain.
arXiv Detail & Related papers (2020-05-12T11:54:56Z) - Image Fine-grained Inpainting [89.17316318927621]
We present a one-stage model that utilizes dense combinations of dilated convolutions to obtain larger and more effective receptive fields.
To better train this efficient generator, except for frequently-used VGG feature matching loss, we design a novel self-guided regression loss.
We also employ a discriminator with local and global branches to ensure local-global contents consistency.
arXiv Detail & Related papers (2020-02-07T03:45:25Z) - R-FCN: Object Detection via Region-based Fully Convolutional Networks [87.62557357527861]
We present region-based, fully convolutional networks for accurate and efficient object detection.
Our result is achieved at a test-time speed of 170ms per image, 2.5-20x faster than the Faster R-CNN counterpart.
arXiv Detail & Related papers (2016-05-20T15:50:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.