A Holistically-Guided Decoder for Deep Representation Learning with
Applications to Semantic Segmentation and Object Detection
- URL: http://arxiv.org/abs/2012.10162v1
- Date: Fri, 18 Dec 2020 10:51:49 GMT
- Title: A Holistically-Guided Decoder for Deep Representation Learning with
Applications to Semantic Segmentation and Object Detection
- Authors: Jianbo Liu, Sijie Ren, Yuanjie Zheng, Xiaogang Wang, Hongsheng Li
- Abstract summary: One common strategy is to adopt dilated convolutions in the backbone networks to extract high-resolution feature maps.
We propose one novel holistically-guided decoder which is introduced to obtain the high-resolution semantic-rich feature maps.
- Score: 74.88284082187462
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Both high-level and high-resolution feature representations are of great
importance in various visual understanding tasks. To acquire high-resolution
feature maps with high-level semantic information, one common strategy is to
adopt dilated convolutions in the backbone networks to extract high-resolution
feature maps, such as the dilatedFCN-based methods for semantic segmentation.
However, due to many convolution operations are conducted on the
high-resolution feature maps, such methods have large computational complexity
and memory consumption. In this paper, we propose one novel holistically-guided
decoder which is introduced to obtain the high-resolution semantic-rich feature
maps via the multi-scale features from the encoder. The decoding is achieved
via novel holistic codeword generation and codeword assembly operations, which
take advantages of both the high-level and low-level features from the encoder
features. With the proposed holistically-guided decoder, we implement the
EfficientFCN architecture for semantic segmentation and HGD-FPN for object
detection and instance segmentation. The EfficientFCN achieves comparable or
even better performance than state-of-the-art methods with only 1/3 of their
computational costs for semantic segmentation on PASCAL Context, PASCAL VOC,
ADE20K datasets. Meanwhile, the proposed HGD-FPN achieves $>2\%$ higher mean
Average Precision (mAP) when integrated into several object detection
frameworks with ResNet-50 encoding backbones.
Related papers
- Binary Code Similarity Detection via Graph Contrastive Learning on Intermediate Representations [52.34030226129628]
Binary Code Similarity Detection (BCSD) plays a crucial role in numerous fields, including vulnerability detection, malware analysis, and code reuse identification.
In this paper, we propose IRBinDiff, which mitigates compilation differences by leveraging LLVM-IR with higher-level semantic abstraction.
Our extensive experiments, conducted under varied compilation settings, demonstrate that IRBinDiff outperforms other leading BCSD methods in both One-to-one comparison and One-to-many search scenarios.
arXiv Detail & Related papers (2024-10-24T09:09:20Z) - MacFormer: Semantic Segmentation with Fine Object Boundaries [38.430631361558426]
We introduce a new semantic segmentation architecture, MacFormer'', which features two key components.
Firstly, using learnable agent tokens, a Mutual Agent Cross-Attention (MACA) mechanism effectively facilitates the bidirectional integration of features across encoder and decoder layers.
Secondly, a Frequency Enhancement Module (FEM) in the decoder leverages high-frequency and low-frequency components to boost features in the frequency domain.
MacFormer is demonstrated to be compatible with various network architectures and outperforms existing methods in both accuracy and efficiency on datasets benchmark ADE20K and Cityscapes.
arXiv Detail & Related papers (2024-08-11T05:36:10Z) - Generalizable Entity Grounding via Assistance of Large Language Model [77.07759442298666]
We propose a novel approach to densely ground visual entities from a long caption.
We leverage a large multimodal model to extract semantic nouns, a class-a segmentation model to generate entity-level segmentation, and a multi-modal feature fusion module to associate each semantic noun with its corresponding segmentation mask.
arXiv Detail & Related papers (2024-02-04T16:06:05Z) - U-Net v2: Rethinking the Skip Connections of U-Net for Medical Image Segmentation [14.450329809640422]
We introduce U-Net v2, a new robust and efficient U-Net variant for medical image segmentation.
It aims to augment the infusion of semantic information into low-level features while simultaneously refining high-level features with finer details.
arXiv Detail & Related papers (2023-11-29T16:35:24Z) - PointHR: Exploring High-Resolution Architectures for 3D Point Cloud
Segmentation [77.44144260601182]
We explore high-resolution architectures for 3D point cloud segmentation.
We propose a unified pipeline named PointHR, which includes a knn-based sequence operator for feature extraction and a differential resampling operator.
To evaluate these architectures for dense point cloud analysis, we conduct thorough experiments using S3DIS and ScanNetV2 datasets.
arXiv Detail & Related papers (2023-10-11T09:29:17Z) - LENet: Lightweight And Efficient LiDAR Semantic Segmentation Using
Multi-Scale Convolution Attention [0.0]
We propose a projection-based semantic segmentation network called LENet with an encoder-decoder structure for LiDAR-based semantic segmentation.
The encoder is composed of a novel multi-scale convolutional attention (MSCA) module with varying receptive field sizes to capture features.
We show that our proposed method is lighter, more efficient, and robust compared to state-of-the-art semantic segmentation methods.
arXiv Detail & Related papers (2023-01-11T02:51:38Z) - Attention guided global enhancement and local refinement network for
semantic segmentation [5.881350024099048]
A lightweight semantic segmentation network is developed using the encoder-decoder architecture.
A Global Enhancement Method is proposed to aggregate global information from high-level feature maps.
A Local Refinement Module is developed by utilizing the decoder features as the semantic guidance.
The two methods are integrated into a Context Fusion Block, and based on that, a novel Attention guided Global enhancement and Local refinement Network (AGLN) is elaborately designed.
arXiv Detail & Related papers (2022-04-09T02:32:24Z) - EfficientFCN: Holistically-guided Decoding for Semantic Segmentation [49.27021844132522]
State-of-the-art semantic segmentation algorithms are mostly based on dilated Fully Convolutional Networks (dilatedFCN)
We propose the EfficientFCN, whose backbone is a common ImageNet pre-trained network without any dilated convolution.
Such a framework achieves comparable or even better performance than state-of-the-art methods with only 1/3 of the computational cost.
arXiv Detail & Related papers (2020-08-24T14:48:23Z) - Real-Time High-Performance Semantic Image Segmentation of Urban Street
Scenes [98.65457534223539]
We propose a real-time high-performance DCNN-based method for robust semantic segmentation of urban street scenes.
The proposed method achieves the accuracy of 73.6% and 68.0% mean Intersection over Union (mIoU) with the inference speed of 51.0 fps and 39.3 fps.
arXiv Detail & Related papers (2020-03-11T08:45:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.