Attention-guided Chained Context Aggregation for Semantic Segmentation
- URL: http://arxiv.org/abs/2002.12041v4
- Date: Fri, 21 May 2021 03:25:20 GMT
- Title: Attention-guided Chained Context Aggregation for Semantic Segmentation
- Authors: Quan Tang, Fagui Liu, Tong Zhang, Jun Jiang and Yu Zhang
- Abstract summary: This paper proposes a novel series-parallel hybrid paradigm called the Chained Context Aggregation Module (CAM) to diversify feature propagation.
CAM gains features of various spatial scales through chain-connected ladder-style information flows and fuses them in a two-stage process, namely pre-fusion and re-fusion.
We construct the Chained Context Aggregation Network (CANet), which employs an asymmetric decoder to recover precise spatial details of prediction maps.
- Score: 13.555282589559885
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The way features propagate in Fully Convolutional Networks is of momentous
importance to capture multi-scale contexts for obtaining precise segmentation
masks. This paper proposes a novel series-parallel hybrid paradigm called the
Chained Context Aggregation Module (CAM) to diversify feature propagation. CAM
gains features of various spatial scales through chain-connected ladder-style
information flows and fuses them in a two-stage process, namely pre-fusion and
re-fusion. The serial flow continuously increases receptive fields of output
neurons and those in parallel encode different region-based contexts. Each
information flow is a shallow encoder-decoder with appropriate down-sampling
scales to sufficiently capture contextual information. We further adopt an
attention model in CAM to guide feature re-fusion. Based on these developments,
we construct the Chained Context Aggregation Network (CANet), which employs an
asymmetric decoder to recover precise spatial details of prediction maps. We
conduct extensive experiments on six challenging datasets, including Pascal VOC
2012, Pascal Context, Cityscapes, CamVid, SUN-RGBD and GATECH. Results evidence
that CANet achieves state-of-the-art performance.
Related papers
- Remote Sensing Image Segmentation Using Vision Mamba and Multi-Scale Multi-Frequency Feature Fusion [9.098711843118629]
This paper introduces state space model (SSM) and proposes a novel hybrid semantic segmentation network based on vision Mamba (CVMH-UNet)
This method designs a cross-scanning visual state space block (CVSSBlock) that uses cross 2D scanning (CS2D) to fully capture global information from multiple directions.
By incorporating convolutional neural network branches to overcome the constraints of Vision Mamba (VMamba) in acquiring local information, this approach facilitates a comprehensive analysis of both global and local features.
arXiv Detail & Related papers (2024-10-08T02:17:38Z) - ParaTransCNN: Parallelized TransCNN Encoder for Medical Image
Segmentation [7.955518153976858]
We propose an advanced 2D feature extraction method by combining the convolutional neural network and Transformer architectures.
Our method is shown with better segmentation accuracy, especially on small organs.
arXiv Detail & Related papers (2024-01-27T05:58:36Z) - Co-attention Propagation Network for Zero-Shot Video Object Segmentation [91.71692262860323]
Zero-shot object segmentation (ZS-VOS) aims to segment objects in a video sequence without prior knowledge of these objects.
Existing ZS-VOS methods often struggle to distinguish between foreground and background or to keep track of the foreground in complex scenarios.
We propose an encoder-decoder-based hierarchical co-attention propagation network (HCPN) capable of tracking and segmenting objects.
arXiv Detail & Related papers (2023-04-08T04:45:48Z) - PSNet: Parallel Symmetric Network for Video Salient Object Detection [85.94443548452729]
We propose a VSOD network with up and down parallel symmetry, named PSNet.
Two parallel branches with different dominant modalities are set to achieve complete video saliency decoding.
arXiv Detail & Related papers (2022-10-12T04:11:48Z) - Multi-scale and Cross-scale Contrastive Learning for Semantic
Segmentation [5.281694565226513]
We apply contrastive learning to enhance the discriminative power of the multi-scale features extracted by semantic segmentation networks.
By first mapping the encoder's multi-scale representations to a common feature space, we instantiate a novel form of supervised local-global constraint.
arXiv Detail & Related papers (2022-03-25T01:24:24Z) - Group Contextualization for Video Recognition [80.3842253625557]
Group contextualization (GC) can boost the performance of 2D-CNN (e.g., TSN) and TSM.
GC embeds feature with four different kinds of contexts in parallel.
Group contextualization can boost the performance of 2D-CNN (e.g., TSN) to a level comparable to the state-the-art video networks.
arXiv Detail & Related papers (2022-03-18T01:49:40Z) - LC3Net: Ladder context correlation complementary network for salient
object detection [0.32116198597240836]
We propose a novel ladder context correlation complementary network (LC3Net)
FCB is a filterable convolution block to assist the automatic collection of information on the diversity of initial features.
DCM is a dense cross module to facilitate the intimate aggregation of different levels of features.
BCD is a bidirectional compression decoder to help the progressive shrinkage of multi-scale features.
arXiv Detail & Related papers (2021-10-21T03:12:32Z) - MFGNet: Dynamic Modality-Aware Filter Generation for RGB-T Tracking [72.65494220685525]
We propose a new dynamic modality-aware filter generation module (named MFGNet) to boost the message communication between visible and thermal data.
We generate dynamic modality-aware filters with two independent networks. The visible and thermal filters will be used to conduct a dynamic convolutional operation on their corresponding input feature maps respectively.
To address issues caused by heavy occlusion, fast motion, and out-of-view, we propose to conduct a joint local and global search by exploiting a new direction-aware target-driven attention mechanism.
arXiv Detail & Related papers (2021-07-22T03:10:51Z) - Spatial-Temporal Correlation and Topology Learning for Person
Re-Identification in Videos [78.45050529204701]
We propose a novel framework to pursue discriminative and robust representation by modeling cross-scale spatial-temporal correlation.
CTL utilizes a CNN backbone and a key-points estimator to extract semantic local features from human body.
It explores a context-reinforced topology to construct multi-scale graphs by considering both global contextual information and physical connections of human body.
arXiv Detail & Related papers (2021-04-15T14:32:12Z) - Adaptive Context-Aware Multi-Modal Network for Depth Completion [107.15344488719322]
We propose to adopt the graph propagation to capture the observed spatial contexts.
We then apply the attention mechanism on the propagation, which encourages the network to model the contextual information adaptively.
Finally, we introduce the symmetric gated fusion strategy to exploit the extracted multi-modal features effectively.
Our model, named Adaptive Context-Aware Multi-Modal Network (ACMNet), achieves the state-of-the-art performance on two benchmarks.
arXiv Detail & Related papers (2020-08-25T06:00:06Z) - Semantic Segmentation With Multi Scale Spatial Attention For Self
Driving Cars [2.7317088388886384]
We present a novel neural network using multi scale feature fusion at various scales for accurate and efficient semantic image segmentation.
We used ResNet based feature extractor, dilated convolutional layers in downsampling part, atrous convolutional layers in the upsampling part and used concat operation to merge them.
A new attention module is proposed to encode more contextual information and enhance the receptive field of the network.
arXiv Detail & Related papers (2020-06-30T20:19:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.