Position, Padding and Predictions: A Deeper Look at Position Information
in CNNs
- URL: http://arxiv.org/abs/2101.12322v1
- Date: Thu, 28 Jan 2021 23:40:32 GMT
- Title: Position, Padding and Predictions: A Deeper Look at Position Information
in CNNs
- Authors: Md Amirul Islam, Matthew Kowal, Sen Jia, Konstantinos G. Derpanis, and
Neil D. B. Bruce
- Abstract summary: We show that a surprising degree of absolute position information is encoded in commonly used CNNs.
We show that zero padding drives CNNs to encode position information in their internal representations, while a lack of padding precludes position encoding.
This gives rise to deeper questions about the role of position information in CNNs.
- Score: 30.583407443282365
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: In contrast to fully connected networks, Convolutional Neural Networks (CNNs)
achieve efficiency by learning weights associated with local filters with a
finite spatial extent. An implication of this is that a filter may know what it
is looking at, but not where it is positioned in the image. In this paper, we
first test this hypothesis and reveal that a surprising degree of absolute
position information is encoded in commonly used CNNs. We show that zero
padding drives CNNs to encode position information in their internal
representations, while a lack of padding precludes position encoding. This
gives rise to deeper questions about the role of position information in CNNs:
(i) What boundary heuristics enable optimal position encoding for downstream
tasks?; (ii) Does position encoding affect the learning of semantic
representations?; (iii) Does position encoding always improve performance? To
provide answers, we perform the largest case study to date on the role that
padding and border heuristics play in CNNs. We design novel tasks which allow
us to quantify boundary effects as a function of the distance to the border.
Numerous semantic objectives reveal the effect of the border on semantic
representations. Finally, we demonstrate the implications of these findings on
multiple real-world tasks to show that position information can both help or
hurt performance.
Related papers
- Random Padding Data Augmentation [23.70951896315126]
convolutional neural network (CNN) learns the same object in different positions in images.
The usefulness of the features' spatial information in CNNs has not been well investigated.
We introduce Random Padding, a new type of padding method for training CNNs.
arXiv Detail & Related papers (2023-02-17T04:15:33Z) - Word Order Matters when you Increase Masking [70.29624135819884]
We study the effect of removing position encodings on the pre-training objective itself, to test whether models can reconstruct position information from co-occurrences alone.
We find that the necessity of position information increases with the amount of masking, and that masked language models without position encodings are not able to reconstruct this information on the task.
arXiv Detail & Related papers (2022-11-08T18:14:04Z) - What Can Be Learnt With Wide Convolutional Neural Networks? [69.55323565255631]
We study infinitely-wide deep CNNs in the kernel regime.
We prove that deep CNNs adapt to the spatial scale of the target function.
We conclude by computing the generalisation error of a deep CNN trained on the output of another deep CNN.
arXiv Detail & Related papers (2022-08-01T17:19:32Z) - Global Pooling, More than Meets the Eye: Position Information is Encoded
Channel-Wise in CNNs [32.81128493853064]
We demonstrate that positional information is encoded based on the ordering of the channel dimensions, while semantic information is largely not.
We show the real world impact of these findings by applying them to two applications.
arXiv Detail & Related papers (2021-08-17T21:27:30Z) - Dense Interaction Learning for Video-based Person Re-identification [75.03200492219003]
We propose a hybrid framework, Dense Interaction Learning (DenseIL), to tackle video-based person re-ID difficulties.
DenseIL contains a CNN encoder and a Dense Interaction (DI) decoder.
Our experiments consistently and significantly outperform all the state-of-the-art methods on multiple standard video-based re-ID datasets.
arXiv Detail & Related papers (2021-03-16T12:22:08Z) - The Mind's Eye: Visualizing Class-Agnostic Features of CNNs [92.39082696657874]
We propose an approach to visually interpret CNN features given a set of images by creating corresponding images that depict the most informative features of a specific layer.
Our method uses a dual-objective activation and distance loss, without requiring a generator network nor modifications to the original model.
arXiv Detail & Related papers (2021-01-29T07:46:39Z) - Self-supervised Video Representation Learning by Uncovering
Spatio-temporal Statistics [74.6968179473212]
This paper proposes a novel pretext task to address the self-supervised learning problem.
We compute a series of partitioning-temporal statistical summaries, such as the spatial location and dominant direction of the largest motion.
A neural network is built and trained to yield the statistical summaries given the video frames as inputs.
arXiv Detail & Related papers (2020-08-31T08:31:56Z) - How Can CNNs Use Image Position for Segmentation? [23.98839374194848]
A recent study shows that the zero-padding employed in convolutional layers of CNNs provides position information to the CNNs.
However, there is a technical issue with the design of the experiments of the study, and thus the correctness of the claim is yet to be verified.
arXiv Detail & Related papers (2020-05-07T13:38:13Z) - On Translation Invariance in CNNs: Convolutional Layers can Exploit
Absolute Spatial Location [18.932504899552494]
We show that CNNs can and will exploit the absolute spatial location by learning filters that respond exclusively to particular absolute locations.
Because modern CNNs filters have a huge receptive field, these boundary effects operate even far from the image boundary.
arXiv Detail & Related papers (2020-03-16T08:00:06Z) - Depth Based Semantic Scene Completion with Position Importance Aware
Loss [52.06051681324545]
PALNet is a novel hybrid network for semantic scene completion.
It extracts both 2D and 3D features from multi-stages using fine-grained depth information.
It is beneficial for recovering key details like the boundaries of objects and the corners of the scene.
arXiv Detail & Related papers (2020-01-29T07:05:52Z) - How Much Position Information Do Convolutional Neural Networks Encode? [27.604154992915863]
In contrast to fully connected networks, Convolutional Neural Networks (CNNs) achieve efficiency by learning weights associated with local filters with a finite spatial extent.
In this paper, we test this hypothesis revealing the surprising degree of absolute position information that is encoded in commonly used neural networks.
arXiv Detail & Related papers (2020-01-22T19:44:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.