Split-Merge Pooling
- URL: http://arxiv.org/abs/2006.07742v1
- Date: Sat, 13 Jun 2020 23:20:30 GMT
- Title: Split-Merge Pooling
- Authors: Omid Hosseini Jafari, Carsten Rother
- Abstract summary: Split-Merge pooling is introduced to preserve spatial information without subsampling.
We evaluate our approach for dense semantic segmentation of large image sizes taken from the Cityscapes and GTA-5 datasets.
- Score: 36.2980225204665
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: There are a variety of approaches to obtain a vast receptive field with
convolutional neural networks (CNNs), such as pooling or striding convolutions.
Most of these approaches were initially designed for image classification and
later adapted to dense prediction tasks, such as semantic segmentation.
However, the major drawback of this adaptation is the loss of spatial
information. Even the popular dilated convolution approach, which in theory is
able to operate with full spatial resolution, needs to subsample features for
large image sizes in order to make the training and inference tractable. In
this work, we introduce Split-Merge pooling to fully preserve the spatial
information without any subsampling. By applying Split-Merge pooling to deep
networks, we achieve, at the same time, a very large receptive field. We
evaluate our approach for dense semantic segmentation of large image sizes
taken from the Cityscapes and GTA-5 datasets. We demonstrate that by replacing
max-pooling and striding convolutions with our split-merge pooling, we are able
to improve the accuracy of different variations of ResNet significantly.
Related papers
- FuseNet: Self-Supervised Dual-Path Network for Medical Image
Segmentation [3.485615723221064]
FuseNet is a dual-stream framework for self-supervised semantic segmentation.
Cross-modal fusion technique extends the principles of CLIP by replacing textual data with augmented images.
experiments on skin lesion and lung segmentation datasets demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2023-11-22T00:03:16Z) - DeepMerge: Deep-Learning-Based Region-Merging for Image Segmentation [7.063322114865965]
We propose a deep-learning-based region merging method dubbed DeepMerge to handle the segmentation of complete objects in large VHR images.
This is the first method to use deep learning to learn the similarity and merge similar adjacent super-pixels in RAG.
DeepMerge achieves the highest F value (0.9550) and the lowest total error TE (0.0895), correctly segmenting objects of different sizes and outperforming all competing segmentation methods.
arXiv Detail & Related papers (2023-05-31T12:27:58Z) - Pooling Revisited: Your Receptive Field is Suboptimal [35.11562214480459]
The size and shape of the receptive field determine how the network aggregates local information.
We propose a simple yet effective Dynamically Optimized Pooling operation, referred to as DynOPool.
Our experiments show that the models equipped with the proposed learnable resizing module outperform the baseline networks on multiple datasets in image classification and semantic segmentation.
arXiv Detail & Related papers (2022-05-30T17:03:40Z) - AF$_2$: Adaptive Focus Framework for Aerial Imagery Segmentation [86.44683367028914]
Aerial imagery segmentation has some unique challenges, the most critical one among which lies in foreground-background imbalance.
We propose Adaptive Focus Framework (AF$), which adopts a hierarchical segmentation procedure and focuses on adaptively utilizing multi-scale representations.
AF$ has significantly improved the accuracy on three widely used aerial benchmarks, as fast as the mainstream method.
arXiv Detail & Related papers (2022-02-18T10:14:45Z) - Multi-Representation Adaptation Network for Cross-domain Image
Classification [20.615155915233693]
In image classification, it is often expensive and time-consuming to acquire sufficient labels.
Existing approaches mainly align the distributions of representations extracted by a single structure.
We propose Multi-Representation Adaptation which can dramatically improve the classification accuracy for cross-domain image classification.
arXiv Detail & Related papers (2022-01-04T06:34:48Z) - Augmenting Convolutional networks with attention-based aggregation [55.97184767391253]
We show how to augment any convolutional network with an attention-based global map to achieve non-local reasoning.
We plug this learned aggregation layer with a simplistic patch-based convolutional network parametrized by 2 parameters (width and depth)
It yields surprisingly competitive trade-offs between accuracy and complexity, in particular in terms of memory consumption.
arXiv Detail & Related papers (2021-12-27T14:05:41Z) - Adaptive Context-Aware Multi-Modal Network for Depth Completion [107.15344488719322]
We propose to adopt the graph propagation to capture the observed spatial contexts.
We then apply the attention mechanism on the propagation, which encourages the network to model the contextual information adaptively.
Finally, we introduce the symmetric gated fusion strategy to exploit the extracted multi-modal features effectively.
Our model, named Adaptive Context-Aware Multi-Modal Network (ACMNet), achieves the state-of-the-art performance on two benchmarks.
arXiv Detail & Related papers (2020-08-25T06:00:06Z) - Semantic Segmentation With Multi Scale Spatial Attention For Self
Driving Cars [2.7317088388886384]
We present a novel neural network using multi scale feature fusion at various scales for accurate and efficient semantic image segmentation.
We used ResNet based feature extractor, dilated convolutional layers in downsampling part, atrous convolutional layers in the upsampling part and used concat operation to merge them.
A new attention module is proposed to encode more contextual information and enhance the receptive field of the network.
arXiv Detail & Related papers (2020-06-30T20:19:09Z) - CRNet: Cross-Reference Networks for Few-Shot Segmentation [59.85183776573642]
Few-shot segmentation aims to learn a segmentation model that can be generalized to novel classes with only a few training images.
With a cross-reference mechanism, our network can better find the co-occurrent objects in the two images.
Experiments on the PASCAL VOC 2012 dataset show that our network achieves state-of-the-art performance.
arXiv Detail & Related papers (2020-03-24T04:55:43Z) - Image Fine-grained Inpainting [89.17316318927621]
We present a one-stage model that utilizes dense combinations of dilated convolutions to obtain larger and more effective receptive fields.
To better train this efficient generator, except for frequently-used VGG feature matching loss, we design a novel self-guided regression loss.
We also employ a discriminator with local and global branches to ensure local-global contents consistency.
arXiv Detail & Related papers (2020-02-07T03:45:25Z) - Gated Path Selection Network for Semantic Segmentation [72.44994579325822]
We develop a novel network named Gated Path Selection Network (GPSNet), which aims to learn adaptive receptive fields.
In GPSNet, we first design a two-dimensional multi-scale network - SuperNet, which densely incorporates features from growing receptive fields.
To dynamically select desirable semantic context, a gate prediction module is further introduced.
arXiv Detail & Related papers (2020-01-19T12:32:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.