Semantic Segmentation for Urban-Scene Images
- URL: http://arxiv.org/abs/2110.13813v1
- Date: Wed, 20 Oct 2021 08:31:26 GMT
- Title: Semantic Segmentation for Urban-Scene Images
- Authors: Shorya Sharma
- Abstract summary: We re-implement the cutting edge model DeepLabv3+ with ResNet-101 as our strong baseline model.
We incorporate HANet to account for the vertical spatial priors in urban-scene image tasks.
We find that our two-step integrated model improves the mean Intersection-Over-Union (mIoU) score gradually from the baseline model.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Urban-scene Image segmentation is an important and trending topic in computer
vision with wide use cases like autonomous driving [1]. Starting with the
breakthrough work of Long et al. [2] that introduces Fully Convolutional
Networks (FCNs), the development of novel architectures and practical uses of
neural networks in semantic segmentation has been expedited in the recent 5
years. Aside from seeking solutions in general model design for information
shrinkage due to pooling, urban-scene image itself has intrinsic features like
positional patterns [3]. Our project seeks an advanced and integrated solution
that specifically targets urban-scene image semantic segmentation among the
most novel approaches in the current field. We re-implement the cutting edge
model DeepLabv3+ [4] with ResNet-101 [5] backbone as our strong baseline model.
Based upon DeepLabv3+, we incorporate HANet [3] to account for the vertical
spatial priors in urban-scene image tasks. To boost up model efficiency and
performance, we further explore the Atrous Spatial Pooling (ASP) layer in
DeepLabv3+ and infuse a computational efficient variation called "Waterfall"
Atrous Spatial Pooling (WASP) [6] architecture in our model. We find that our
two-step integrated model improves the mean Intersection-Over-Union (mIoU)
score gradually from the baseline model. In particular, HANet successfully
identifies height-driven patterns and improves per-class IoU of common class
labels in urban scenario like fence and bus. We also demonstrate the
improvement of model efficiency with help of WASP in terms of computational
times during training and parameter reduction from the original ASPP module.
Related papers
- SIGMA:Sinkhorn-Guided Masked Video Modeling [69.31715194419091]
Sinkhorn-guided Masked Video Modelling ( SIGMA) is a novel video pretraining method.
We distribute features of space-time tubes evenly across a limited number of learnable clusters.
Experimental results on ten datasets validate the effectiveness of SIGMA in learning more performant, temporally-aware, and robust video representations.
arXiv Detail & Related papers (2024-07-22T08:04:09Z) - SODAWideNet -- Salient Object Detection with an Attention augmented Wide
Encoder Decoder network without ImageNet pre-training [3.66237529322911]
We explore developing a neural network from scratch directly trained on Salient Object Detection without ImageNet pre-training.
We propose SODAWideNet, an encoder-decoder-style network for Salient Object Detection.
Two variants, SODAWideNet-S (3.03M) and SODAWideNet (9.03M), achieve competitive performance against state-of-the-art models on five datasets.
arXiv Detail & Related papers (2023-11-08T16:53:44Z) - GraphCSPN: Geometry-Aware Depth Completion via Dynamic GCNs [49.55919802779889]
We propose a Graph Convolution based Spatial Propagation Network (GraphCSPN) as a general approach for depth completion.
In this work, we leverage convolution neural networks as well as graph neural networks in a complementary way for geometric representation learning.
Our method achieves the state-of-the-art performance, especially when compared in the case of using only a few propagation steps.
arXiv Detail & Related papers (2022-10-19T17:56:03Z) - Unsupervised Deep Learning Meets Chan-Vese Model [77.24463525356566]
We propose an unsupervised image segmentation approach that integrates the Chan-Vese (CV) model with deep neural networks.
Our basic idea is to apply a deep neural network that maps the image into a latent space to alleviate the violation of the piecewise constant assumption in image space.
arXiv Detail & Related papers (2022-04-14T13:23:57Z) - Learning to Associate Every Segment for Video Panoptic Segmentation [123.03617367709303]
We learn coarse segment-level matching and fine pixel-level matching together.
We show that our per-frame computation model can achieve new state-of-the-art results on Cityscapes-VPS and VIPER datasets.
arXiv Detail & Related papers (2021-06-17T13:06:24Z) - Third ArchEdge Workshop: Exploring the Design Space of Efficient Deep
Neural Networks [14.195694804273801]
This paper gives an overview of our ongoing work on the design space exploration of efficient deep neural networks (DNNs)
We cover two aspects: (1) static architecture design efficiency and (2) dynamic model execution efficiency.
We highlight several open questions that are poised to draw research attention in the next few years.
arXiv Detail & Related papers (2020-11-22T01:56:46Z) - Deep Active Surface Models [60.027353171412216]
Active Surface Models have a long history of being useful to model complex 3D surfaces but only Active Contours have been used in conjunction with deep networks.
We introduce layers that implement them that can be integrated seamlessly into Graph Convolutional Networks to enforce sophisticated smoothness priors.
arXiv Detail & Related papers (2020-11-17T18:48:28Z) - Real-Time High-Performance Semantic Image Segmentation of Urban Street
Scenes [98.65457534223539]
We propose a real-time high-performance DCNN-based method for robust semantic segmentation of urban street scenes.
The proposed method achieves the accuracy of 73.6% and 68.0% mean Intersection over Union (mIoU) with the inference speed of 51.0 fps and 39.3 fps.
arXiv Detail & Related papers (2020-03-11T08:45:53Z) - Cars Can't Fly up in the Sky: Improving Urban-Scene Segmentation via
Height-driven Attention Networks [32.01932474622993]
This paper exploits the intrinsic features of urban-scene images and proposes a general add-on module, called height-driven attention networks (HANet)
It emphasizes informative features or classes selectively according to the vertical position of a pixel.
Our method achieves a new state-of-the-art performance on the Cityscapes benchmark with a large margin among ResNet-101 based segmentation models.
arXiv Detail & Related papers (2020-03-11T06:22:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.