Cars Can't Fly up in the Sky: Improving Urban-Scene Segmentation via
Height-driven Attention Networks
- URL: http://arxiv.org/abs/2003.05128v3
- Date: Tue, 7 Apr 2020 02:34:31 GMT
- Title: Cars Can't Fly up in the Sky: Improving Urban-Scene Segmentation via
Height-driven Attention Networks
- Authors: Sungha Choi, Joanne T. Kim, Jaegul Choo
- Abstract summary: This paper exploits the intrinsic features of urban-scene images and proposes a general add-on module, called height-driven attention networks (HANet)
It emphasizes informative features or classes selectively according to the vertical position of a pixel.
Our method achieves a new state-of-the-art performance on the Cityscapes benchmark with a large margin among ResNet-101 based segmentation models.
- Score: 32.01932474622993
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper exploits the intrinsic features of urban-scene images and proposes
a general add-on module, called height-driven attention networks (HANet), for
improving semantic segmentation for urban-scene images. It emphasizes
informative features or classes selectively according to the vertical position
of a pixel. The pixel-wise class distributions are significantly different from
each other among horizontally segmented sections in the urban-scene images.
Likewise, urban-scene images have their own distinct characteristics, but most
semantic segmentation networks do not reflect such unique attributes in the
architecture. The proposed network architecture incorporates the capability
exploiting the attributes to handle the urban scene dataset effectively. We
validate the consistent performance (mIoU) increase of various semantic
segmentation models on two datasets when HANet is adopted. This extensive
quantitative analysis demonstrates that adding our module to existing models is
easy and cost-effective. Our method achieves a new state-of-the-art performance
on the Cityscapes benchmark with a large margin among ResNet-101 based
segmentation models. Also, we show that the proposed model is coherent with the
facts observed in the urban scene by visualizing and interpreting the attention
map. Our code and trained models are publicly available at
https://github.com/shachoi/HANet
Related papers
- Multi-scale Semantic Prior Features Guided Deep Neural Network for Urban Street-view Image [1.4473649585131072]
This paper presents a novel Deep Neural Network (DNN) for inpainting street-view images.
A semantic prior prompter is introduced to learn rich semantic priors from large pre-trained model.
Experiments on Apolloscapes and Cityscapes datasets demonstrate better performance than state-of-the-art methods.
arXiv Detail & Related papers (2024-05-17T03:02:18Z) - Rethinking Interactive Image Segmentation with Low Latency, High Quality, and Diverse Prompts [68.86537322287474]
Low-latency and high-quality interactive segmentation with diverse prompts are challenging for specialist and generalist models.
We propose SegNext, a next-generation interactive segmentation approach offering low latency, high quality, and diverse prompt support.
Our method outperforms current state-of-the-art methods on HQSeg-44K and DAVIS, both quantitatively and qualitatively.
arXiv Detail & Related papers (2024-03-31T17:02:24Z) - Self-attention on Multi-Shifted Windows for Scene Segmentation [14.47974086177051]
We explore the effective use of self-attention within multi-scale image windows to learn descriptive visual features.
We propose three different strategies to aggregate these feature maps to decode the feature representation for dense prediction.
Our models achieve very promising performance on four public scene segmentation datasets.
arXiv Detail & Related papers (2022-07-10T07:36:36Z) - Distilling Ensemble of Explanations for Weakly-Supervised Pre-Training
of Image Segmentation Models [54.49581189337848]
We propose a method to enable the end-to-end pre-training for image segmentation models based on classification datasets.
The proposed method leverages a weighted segmentation learning procedure to pre-train the segmentation network en masse.
Experiment results show that, with ImageNet accompanied by PSSL as the source dataset, the proposed end-to-end pre-training strategy successfully boosts the performance of various segmentation models.
arXiv Detail & Related papers (2022-07-04T13:02:32Z) - Semantic Segmentation for Urban-Scene Images [0.0]
We re-implement the cutting edge model DeepLabv3+ with ResNet-101 as our strong baseline model.
We incorporate HANet to account for the vertical spatial priors in urban-scene image tasks.
We find that our two-step integrated model improves the mean Intersection-Over-Union (mIoU) score gradually from the baseline model.
arXiv Detail & Related papers (2021-10-20T08:31:26Z) - Learning to Associate Every Segment for Video Panoptic Segmentation [123.03617367709303]
We learn coarse segment-level matching and fine pixel-level matching together.
We show that our per-frame computation model can achieve new state-of-the-art results on Cityscapes-VPS and VIPER datasets.
arXiv Detail & Related papers (2021-06-17T13:06:24Z) - Segmenter: Transformer for Semantic Segmentation [79.9887988699159]
We introduce Segmenter, a transformer model for semantic segmentation.
We build on the recent Vision Transformer (ViT) and extend it to semantic segmentation.
It outperforms the state of the art on the challenging ADE20K dataset and performs on-par on Pascal Context and Cityscapes.
arXiv Detail & Related papers (2021-05-12T13:01:44Z) - Group-Wise Semantic Mining for Weakly Supervised Semantic Segmentation [49.90178055521207]
This work addresses weakly supervised semantic segmentation (WSSS), with the goal of bridging the gap between image-level annotations and pixel-level segmentation.
We formulate WSSS as a novel group-wise learning task that explicitly models semantic dependencies in a group of images to estimate more reliable pseudo ground-truths.
In particular, we devise a graph neural network (GNN) for group-wise semantic mining, wherein input images are represented as graph nodes.
arXiv Detail & Related papers (2020-12-09T12:40:13Z) - Naive-Student: Leveraging Semi-Supervised Learning in Video Sequences
for Urban Scene Segmentation [57.68890534164427]
In this work, we ask if we may leverage semi-supervised learning in unlabeled video sequences and extra images to improve the performance on urban scene segmentation.
We simply predict pseudo-labels for the unlabeled data and train subsequent models with both human-annotated and pseudo-labeled data.
Our Naive-Student model, trained with such simple yet effective iterative semi-supervised learning, attains state-of-the-art results at all three Cityscapes benchmarks.
arXiv Detail & Related papers (2020-05-20T18:00:05Z) - Evolution of Image Segmentation using Deep Convolutional Neural Network:
A Survey [0.0]
We take a glance at the evolution of both semantic and instance segmentation work based on CNN.
We have given a glimpse of some state-of-the-art panoptic segmentation models.
arXiv Detail & Related papers (2020-01-13T06:07:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.