Semantic Segmentation With Multi Scale Spatial Attention For Self
Driving Cars
- URL: http://arxiv.org/abs/2007.12685v3
- Date: Wed, 30 Sep 2020 22:56:11 GMT
- Title: Semantic Segmentation With Multi Scale Spatial Attention For Self
Driving Cars
- Authors: Abhinav Sagar, RajKumar Soundrapandiyan
- Abstract summary: We present a novel neural network using multi scale feature fusion at various scales for accurate and efficient semantic image segmentation.
We used ResNet based feature extractor, dilated convolutional layers in downsampling part, atrous convolutional layers in the upsampling part and used concat operation to merge them.
A new attention module is proposed to encode more contextual information and enhance the receptive field of the network.
- Score: 2.7317088388886384
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we present a novel neural network using multi scale feature
fusion at various scales for accurate and efficient semantic image
segmentation. We used ResNet based feature extractor, dilated convolutional
layers in downsampling part, atrous convolutional layers in the upsampling part
and used concat operation to merge them. A new attention module is proposed to
encode more contextual information and enhance the receptive field of the
network. We present an in depth theoretical analysis of our network with
training and optimization details. Our network was trained and tested on the
Camvid dataset and Cityscapes dataset using mean accuracy per class and
Intersection Over Union (IOU) as the evaluation metrics. Our model outperforms
previous state of the art methods on semantic segmentation achieving mean IOU
value of 74.12 while running at >100 FPS.
Related papers
- Early Fusion of Features for Semantic Segmentation [10.362589129094975]
This paper introduces a novel segmentation framework that integrates a classifier network with a reverse HRNet architecture for efficient image segmentation.
Our methodology is rigorously tested across several benchmark datasets including Mapillary Vistas, Cityscapes, CamVid, COCO, and PASCAL-VOC2012.
The results demonstrate the effectiveness of our proposed model in achieving high segmentation accuracy, indicating its potential for various applications in image analysis.
arXiv Detail & Related papers (2024-02-08T22:58:06Z) - MCFNet: Multi-scale Covariance Feature Fusion Network for Real-time
Semantic Segmentation [6.0118706234809975]
We propose a new architecture based on Bilateral Network (BiseNet) called Multi-scale Covariance Feature Fusion Network (MCFNet)
Specifically, this network introduces a new feature refinement module and a new feature fusion module.
We evaluate our proposed model on Cityscapes, CamVid datasets and compare it with the state-of-the-art methods.
arXiv Detail & Related papers (2023-12-12T12:20:27Z) - De-coupling and De-positioning Dense Self-supervised Learning [65.56679416475943]
Dense Self-Supervised Learning (SSL) methods address the limitations of using image-level feature representations when handling images with multiple objects.
We show that they suffer from coupling and positional bias, which arise from the receptive field increasing with layer depth and zero-padding.
We demonstrate the benefits of our method on COCO and on a new challenging benchmark, OpenImage-MINI, for object classification, semantic segmentation, and object detection.
arXiv Detail & Related papers (2023-03-29T18:07:25Z) - Semantic Labeling of High Resolution Images Using EfficientUNets and
Transformers [5.177947445379688]
We propose a new segmentation model that combines convolutional neural networks with deep transformers.
Our results demonstrate that the proposed methodology improves segmentation accuracy compared to state-of-the-art techniques.
arXiv Detail & Related papers (2022-06-20T12:03:54Z) - Self-supervised Audiovisual Representation Learning for Remote Sensing Data [96.23611272637943]
We propose a self-supervised approach for pre-training deep neural networks in remote sensing.
By exploiting the correspondence between geo-tagged audio recordings and remote sensing, this is done in a completely label-free manner.
We show that our approach outperforms existing pre-training strategies for remote sensing imagery.
arXiv Detail & Related papers (2021-08-02T07:50:50Z) - AASeg: Attention Aware Network for Real Time Semantic Segmentation [0.0]
We present a new network named Attention Aware Network (AASeg) for real time semantic image segmentation.
Our network incorporates and channel information using Spatial Attention (SA) and Channel Attention (CA) Mean modules.
We demonstrate the effectiveness of our method using a comprehensive analysis, quantitative experimental results and ablation study using Cityscapes, ADE20K and Camvid datasets.
arXiv Detail & Related papers (2021-07-27T20:01:55Z) - Learning to Segment Human Body Parts with Synthetically Trained Deep
Convolutional Networks [58.0240970093372]
This paper presents a new framework for human body part segmentation based on Deep Convolutional Neural Networks trained using only synthetic data.
The proposed approach achieves cutting-edge results without the need of training the models with real annotated data of human body parts.
arXiv Detail & Related papers (2021-02-02T12:26:50Z) - Densely Connected Recurrent Residual (Dense R2UNet) Convolutional Neural
Network for Segmentation of Lung CT Images [0.342658286826597]
We present a synthesis of Recurrent CNN, Residual Network and Dense Convolutional Network based on the U-Net model architecture.
The proposed model tested on the benchmark Lung Lesion dataset showed better performance on segmentation tasks than its equivalent models.
arXiv Detail & Related papers (2021-02-01T06:34:10Z) - Group-Wise Semantic Mining for Weakly Supervised Semantic Segmentation [49.90178055521207]
This work addresses weakly supervised semantic segmentation (WSSS), with the goal of bridging the gap between image-level annotations and pixel-level segmentation.
We formulate WSSS as a novel group-wise learning task that explicitly models semantic dependencies in a group of images to estimate more reliable pseudo ground-truths.
In particular, we devise a graph neural network (GNN) for group-wise semantic mining, wherein input images are represented as graph nodes.
arXiv Detail & Related papers (2020-12-09T12:40:13Z) - CRNet: Cross-Reference Networks for Few-Shot Segmentation [59.85183776573642]
Few-shot segmentation aims to learn a segmentation model that can be generalized to novel classes with only a few training images.
With a cross-reference mechanism, our network can better find the co-occurrent objects in the two images.
Experiments on the PASCAL VOC 2012 dataset show that our network achieves state-of-the-art performance.
arXiv Detail & Related papers (2020-03-24T04:55:43Z) - Real-Time High-Performance Semantic Image Segmentation of Urban Street
Scenes [98.65457534223539]
We propose a real-time high-performance DCNN-based method for robust semantic segmentation of urban street scenes.
The proposed method achieves the accuracy of 73.6% and 68.0% mean Intersection over Union (mIoU) with the inference speed of 51.0 fps and 39.3 fps.
arXiv Detail & Related papers (2020-03-11T08:45:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.