Multi Receptive Field Network for Semantic Segmentation
- URL: http://arxiv.org/abs/2011.08577v2
- Date: Wed, 7 Sep 2022 14:01:42 GMT
- Title: Multi Receptive Field Network for Semantic Segmentation
- Authors: Jianlong Yuan, Zelu Deng, Shu Wang, Zhenbo Luo
- Abstract summary: We propose a new Multi-Receptive Field Module (MRFM) for semantic segmentation.
We also design an edge-aware loss which is effective in distinguishing the boundaries of object/stuff.
Specifically, we achieve a mean IoU of 83.0 on the Cityscapes dataset and 88.4 mean IoU on the Pascal VOC2012 dataset.
- Score: 8.06045579589765
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Semantic segmentation is one of the key tasks in computer vision, which is to
assign a category label to each pixel in an image. Despite significant progress
achieved recently, most existing methods still suffer from two challenging
issues: 1) the size of objects and stuff in an image can be very diverse,
demanding for incorporating multi-scale features into the fully convolutional
networks (FCNs); 2) the pixels close to or at the boundaries of object/stuff
are hard to classify due to the intrinsic weakness of convolutional networks.
To address the first issue, we propose a new Multi-Receptive Field Module
(MRFM), explicitly taking multi-scale features into account. For the second
issue, we design an edge-aware loss which is effective in distinguishing the
boundaries of object/stuff. With these two designs, our Multi Receptive Field
Network achieves new state-of-the-art results on two widely-used semantic
segmentation benchmark datasets. Specifically, we achieve a mean IoU of 83.0 on
the Cityscapes dataset and 88.4 mean IoU on the Pascal VOC2012 dataset.
Related papers
- MacFormer: Semantic Segmentation with Fine Object Boundaries [38.430631361558426]
We introduce a new semantic segmentation architecture, MacFormer'', which features two key components.
Firstly, using learnable agent tokens, a Mutual Agent Cross-Attention (MACA) mechanism effectively facilitates the bidirectional integration of features across encoder and decoder layers.
Secondly, a Frequency Enhancement Module (FEM) in the decoder leverages high-frequency and low-frequency components to boost features in the frequency domain.
MacFormer is demonstrated to be compatible with various network architectures and outperforms existing methods in both accuracy and efficiency on datasets benchmark ADE20K and Cityscapes.
arXiv Detail & Related papers (2024-08-11T05:36:10Z) - Hi-ResNet: Edge Detail Enhancement for High-Resolution Remote Sensing Segmentation [10.919956120261539]
High-resolution remote sensing (HRS) semantic segmentation extracts key objects from high-resolution coverage areas.
objects of the same category within HRS images show significant differences in scale and shape across diverse geographical environments.
We propose a High-resolution remote sensing network (Hi-ResNet) with efficient network structure designs.
arXiv Detail & Related papers (2023-05-22T03:58:25Z) - EAA-Net: Rethinking the Autoencoder Architecture with Intra-class
Features for Medical Image Segmentation [4.777011444412729]
We propose a light-weight end-to-end segmentation framework based on multi-task learning, termed Edge Attention autoencoder Network (EAA-Net)
Our approach not only utilizes the segmentation network to obtain inter-class features, but also applies the reconstruction network to extract intra-class features among the foregrounds.
Experimental results show that our method performs well in medical image segmentation tasks.
arXiv Detail & Related papers (2022-08-19T07:42:55Z) - AF$_2$: Adaptive Focus Framework for Aerial Imagery Segmentation [86.44683367028914]
Aerial imagery segmentation has some unique challenges, the most critical one among which lies in foreground-background imbalance.
We propose Adaptive Focus Framework (AF$), which adopts a hierarchical segmentation procedure and focuses on adaptively utilizing multi-scale representations.
AF$ has significantly improved the accuracy on three widely used aerial benchmarks, as fast as the mainstream method.
arXiv Detail & Related papers (2022-02-18T10:14:45Z) - Open-World Entity Segmentation [70.41548013910402]
We introduce a new image segmentation task, termed Entity (ES) with the aim to segment all visual entities in an image without considering semantic category labels.
All semantically-meaningful segments are equally treated as categoryless entities and there is no thing-stuff distinction.
ES enables the following: (1) merging multiple datasets to form a large training set without the need to resolve label conflicts; (2) any model trained on one dataset can generalize exceptionally well to other datasets with unseen domains.
arXiv Detail & Related papers (2021-07-29T17:59:05Z) - Boundary-Aware Segmentation Network for Mobile and Web Applications [60.815545591314915]
Boundary-Aware Network (BASNet) is integrated with a predict-refine architecture and a hybrid loss for highly accurate image segmentation.
BASNet runs at over 70 fps on a single GPU which benefits many potential real applications.
Based on BASNet, we further developed two (close to) commercial applications: AR COPY & PASTE, in which BASNet is augmented reality for "COPY" and "PASTING" real-world objects, and OBJECT CUT, which is a web-based tool for automatic object background removal.
arXiv Detail & Related papers (2021-01-12T19:20:26Z) - CARAFE++: Unified Content-Aware ReAssembly of FEatures [132.49582482421246]
We propose unified Content-Aware ReAssembly of FEatures (CARAFE++), a universal, lightweight and highly effective operator to fulfill this goal.
CARAFE++ generates adaptive kernels on-the-fly to enable instance-specific content-aware handling.
It shows consistent and substantial gains across all the tasks with negligible computational overhead.
arXiv Detail & Related papers (2020-12-07T07:34:57Z) - Improving Semantic Segmentation via Decoupled Body and Edge Supervision [89.57847958016981]
Existing semantic segmentation approaches either aim to improve the object's inner consistency by modeling the global context, or refine objects detail along their boundaries by multi-scale feature fusion.
In this paper, a new paradigm for semantic segmentation is proposed.
Our insight is that appealing performance of semantic segmentation requires textitexplicitly modeling the object textitbody and textitedge, which correspond to the high and low frequency of the image.
We show that the proposed framework with various baselines or backbone networks leads to better object inner consistency and object boundaries.
arXiv Detail & Related papers (2020-07-20T12:11:22Z) - CRNet: Cross-Reference Networks for Few-Shot Segmentation [59.85183776573642]
Few-shot segmentation aims to learn a segmentation model that can be generalized to novel classes with only a few training images.
With a cross-reference mechanism, our network can better find the co-occurrent objects in the two images.
Experiments on the PASCAL VOC 2012 dataset show that our network achieves state-of-the-art performance.
arXiv Detail & Related papers (2020-03-24T04:55:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.