How Can CNNs Use Image Position for Segmentation?
- URL: http://arxiv.org/abs/2005.03463v1
- Date: Thu, 7 May 2020 13:38:13 GMT
- Title: How Can CNNs Use Image Position for Segmentation?
- Authors: Rito Murase, Masanori Suganuma and Takayuki Okatani
- Abstract summary: A recent study shows that the zero-padding employed in convolutional layers of CNNs provides position information to the CNNs.
However, there is a technical issue with the design of the experiments of the study, and thus the correctness of the claim is yet to be verified.
- Score: 23.98839374194848
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Convolution is an equivariant operation, and image position does not affect
its result. A recent study shows that the zero-padding employed in
convolutional layers of CNNs provides position information to the CNNs. The
study further claims that the position information enables accurate inference
for several tasks, such as object recognition, segmentation, etc. However,
there is a technical issue with the design of the experiments of the study, and
thus the correctness of the claim is yet to be verified. Moreover, the absolute
image position may not be essential for the segmentation of natural images, in
which target objects will appear at any image position. In this study, we
investigate how positional information is and can be utilized for segmentation
tasks. Toward this end, we consider {\em positional encoding} (PE) that adds
channels embedding image position to the input images and compare PE with
several padding methods. Considering the above nature of natural images, we
choose medical image segmentation tasks, in which the absolute position appears
to be relatively important, as the same organs (of different patients) are
captured in similar sizes and positions. We draw a mixed conclusion from the
experimental results; the positional encoding certainly works in some cases,
but the absolute image position may not be so important for segmentation tasks
as we think.
Related papers
- Effect of Rotation Angle in Self-Supervised Pre-training is Dataset-Dependent [3.434553688053531]
Self-supervised learning for pre-training can help the network learn better low-level features.
In contrastive pre-training, the network is pre-trained to distinguish between different versions of the input.
We show that, when training using contrastive pre-training in this way, the angle $theta$ and the dataset interact in interesting ways.
arXiv Detail & Related papers (2024-06-21T12:25:07Z) - Unsupervised Domain Adaptation for Medical Image Segmentation via
Feature-space Density Matching [0.0]
This paper presents an unsupervised domain adaptation approach for semantic segmentation.
We match the target data distribution to the source in the feature space, particularly when the number of target samples is limited.
We demonstrate the efficacy of our proposed approach on 2 datasets, multisite prostate MRI and histopathology images.
arXiv Detail & Related papers (2023-05-09T22:24:46Z) - CSP: Self-Supervised Contrastive Spatial Pre-Training for
Geospatial-Visual Representations [90.50864830038202]
We present Contrastive Spatial Pre-Training (CSP), a self-supervised learning framework for geo-tagged images.
We use a dual-encoder to separately encode the images and their corresponding geo-locations, and use contrastive objectives to learn effective location representations from images.
CSP significantly boosts the model performance with 10-34% relative improvement with various labeled training data sampling ratios.
arXiv Detail & Related papers (2023-05-01T23:11:18Z) - Self-Supervised Correction Learning for Semi-Supervised Biomedical Image
Segmentation [84.58210297703714]
We propose a self-supervised correction learning paradigm for semi-supervised biomedical image segmentation.
We design a dual-task network, including a shared encoder and two independent decoders for segmentation and lesion region inpainting.
Experiments on three medical image segmentation datasets for different tasks demonstrate the outstanding performance of our method.
arXiv Detail & Related papers (2023-01-12T08:19:46Z) - Location-Aware Self-Supervised Transformers [74.76585889813207]
We propose to pretrain networks for semantic segmentation by predicting the relative location of image parts.
We control the difficulty of the task by masking a subset of the reference patch features visible to those of the query.
Our experiments show that this location-aware pretraining leads to representations that transfer competitively to several challenging semantic segmentation benchmarks.
arXiv Detail & Related papers (2022-12-05T16:24:29Z) - Residual Moment Loss for Medical Image Segmentation [56.72261489147506]
Location information is proven to benefit the deep learning models on capturing the manifold structure of target objects.
Most existing methods encode the location information in an implicit way, for the network to learn.
We propose a novel loss function, namely residual moment (RM) loss, to explicitly embed the location information of segmentation targets.
arXiv Detail & Related papers (2021-06-27T09:31:49Z) - Compositional Sketch Search [91.84489055347585]
We present an algorithm for searching image collections using free-hand sketches.
We exploit drawings as a concise and intuitive representation for specifying entire scene compositions.
arXiv Detail & Related papers (2021-06-15T09:38:09Z) - Duplex Contextual Relation Network for Polyp Segmentation [19.509290186267396]
We propose Duplex Contextual Relation Network (DCRNet) to capture both within-image and cross-image contextual relations.
We evaluate the proposed method on the EndoScene, Kvasir-SEG and the recently released large-scale PICCOLO dataset.
arXiv Detail & Related papers (2021-03-11T15:19:54Z) - Scale Normalized Image Pyramids with AutoFocus for Object Detection [75.71320993452372]
A scale normalized image pyramid (SNIP) is generated that, like human vision, only attends to objects within a fixed size range at different scales.
We propose an efficient spatial sub-sampling scheme which only operates on fixed-size sub-regions likely to contain objects.
The resulting algorithm is referred to as AutoFocus and results in a 2.5-5 times speed-up during inference when used with SNIP.
arXiv Detail & Related papers (2021-02-10T18:57:53Z) - Position, Padding and Predictions: A Deeper Look at Position Information
in CNNs [30.583407443282365]
We show that a surprising degree of absolute position information is encoded in commonly used CNNs.
We show that zero padding drives CNNs to encode position information in their internal representations, while a lack of padding precludes position encoding.
This gives rise to deeper questions about the role of position information in CNNs.
arXiv Detail & Related papers (2021-01-28T23:40:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.