Simple and Efficient Architectures for Semantic Segmentation
- URL: http://arxiv.org/abs/2206.08236v1
- Date: Thu, 16 Jun 2022 15:08:34 GMT
- Title: Simple and Efficient Architectures for Semantic Segmentation
- Authors: Dushyant Mehta, Andrii Skliar, Haitam Ben Yahia, Shubhankar Borse,
Fatih Porikli, Amirhossein Habibian, Tijmen Blankevoort
- Abstract summary: We show that a simple encoder-decoder architecture with a ResNet-like backbone and a small multi-scale head, performs on-par or better than complex semantic segmentation architectures such as HRNet, FANet and DDRNet.
We present a family of such simple architectures for desktop as well as mobile targets, which match or exceed the performance of complex models on the Cityscapes dataset.
- Score: 50.1563637917129
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Though the state-of-the architectures for semantic segmentation, such as
HRNet, demonstrate impressive accuracy, the complexity arising from their
salient design choices hinders a range of model acceleration tools, and further
they make use of operations that are inefficient on current hardware. This
paper demonstrates that a simple encoder-decoder architecture with a
ResNet-like backbone and a small multi-scale head, performs on-par or better
than complex semantic segmentation architectures such as HRNet, FANet and
DDRNets. Naively applying deep backbones designed for Image Classification to
the task of Semantic Segmentation leads to sub-par results, owing to a much
smaller effective receptive field of these backbones. Implicit among the
various design choices put forth in works like HRNet, DDRNet, and FANet are
networks with a large effective receptive field. It is natural to ask if a
simple encoder-decoder architecture would compare favorably if comprised of
backbones that have a larger effective receptive field, though without the use
of inefficient operations like dilated convolutions. We show that with minor
and inexpensive modifications to ResNets, enlarging the receptive field, very
simple and competitive baselines can be created for Semantic Segmentation. We
present a family of such simple architectures for desktop as well as mobile
targets, which match or exceed the performance of complex models on the
Cityscapes dataset. We hope that our work provides simple yet effective
baselines for practitioners to develop efficient semantic segmentation models.
Related papers
- AsCAN: Asymmetric Convolution-Attention Networks for Efficient Recognition and Generation [48.82264764771652]
We introduce AsCAN -- a hybrid architecture, combining both convolutional and transformer blocks.
AsCAN supports a variety of tasks: recognition, segmentation, class-conditional image generation.
We then scale the same architecture to solve a large-scale text-to-image task and show state-of-the-art performance.
arXiv Detail & Related papers (2024-11-07T18:43:17Z) - The revenge of BiSeNet: Efficient Multi-Task Image Segmentation [6.172605433695617]
BiSeNetFormer is a novel architecture for efficient multi-task image segmentation.
By seamlessly supporting multiple tasks, BiSeNetFormer offers a versatile solution for multi-task segmentation.
Our results indicate that BiSeNetFormer represents a significant advancement towards fast, efficient, and multi-task segmentation networks.
arXiv Detail & Related papers (2024-04-15T08:32:18Z) - PEM: Prototype-based Efficient MaskFormer for Image Segmentation [10.795762739721294]
Recent transformer-based architectures have shown impressive results in the field of image segmentation.
We propose Prototype-based Efficient MaskFormer (PEM), an efficient transformer-based architecture that can operate in multiple segmentation tasks.
arXiv Detail & Related papers (2024-02-29T18:21:54Z) - A Simple Single-Scale Vision Transformer for Object Localization and
Instance Segmentation [79.265315267391]
We propose a simple and compact ViT architecture called Universal Vision Transformer (UViT)
UViT achieves strong performance on object detection and instance segmentation tasks.
arXiv Detail & Related papers (2021-12-17T20:11:56Z) - Operation Embeddings for Neural Architecture Search [15.033712726016255]
We propose the replacement of fixed operator encoding with learnable representations in the optimization process.
Our method produces top-performing architectures that share similar operation and graph patterns.
arXiv Detail & Related papers (2021-05-11T09:17:10Z) - Boundary-Aware Segmentation Network for Mobile and Web Applications [60.815545591314915]
Boundary-Aware Network (BASNet) is integrated with a predict-refine architecture and a hybrid loss for highly accurate image segmentation.
BASNet runs at over 70 fps on a single GPU which benefits many potential real applications.
Based on BASNet, we further developed two (close to) commercial applications: AR COPY & PASTE, in which BASNet is augmented reality for "COPY" and "PASTING" real-world objects, and OBJECT CUT, which is a web-based tool for automatic object background removal.
arXiv Detail & Related papers (2021-01-12T19:20:26Z) - Towards Efficient Scene Understanding via Squeeze Reasoning [71.1139549949694]
We propose a novel framework called Squeeze Reasoning.
Instead of propagating information on the spatial map, we first learn to squeeze the input feature into a channel-wise global vector.
We show that our approach can be modularized as an end-to-end trained block and can be easily plugged into existing networks.
arXiv Detail & Related papers (2020-11-06T12:17:01Z) - Structured Convolutions for Efficient Neural Network Design [65.36569572213027]
We tackle model efficiency by exploiting redundancy in the textitimplicit structure of the building blocks of convolutional neural networks.
We show how this decomposition can be applied to 2D and 3D kernels as well as the fully-connected layers.
arXiv Detail & Related papers (2020-08-06T04:38:38Z) - Lets keep it simple, Using simple architectures to outperform deeper and
more complex architectures [12.76864681474486]
Convolutional Neural Networks (CNNs) include tens to hundreds of millions of parameters, which impose considerable computation and memory overhead.
We propose a simple architecture called SimpleNet, based on a set of designing principles, with which we empirically show, a well-crafted yet simple and reasonably deep architecture can perform on par with deeper and more complex architectures.
Our simple 13-layer architecture outperforms most of the deeper and complex architectures to date such as VGGNet, ResNet, and GoogleNet on several well-known benchmarks.
arXiv Detail & Related papers (2016-08-22T02:50:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.