CTNet: Context-based Tandem Network for Semantic Segmentation
- URL: http://arxiv.org/abs/2104.09805v1
- Date: Tue, 20 Apr 2021 07:33:11 GMT
- Title: CTNet: Context-based Tandem Network for Semantic Segmentation
- Authors: Zechao Li, Yanpeng Sun, and Jinhui Tang
- Abstract summary: This work proposes a novel Context-based Tandem Network (CTNet) by interactively exploring the spatial contextual information and the channel contextual information.
To further improve the performance of the learned representations for semantic segmentation, the results of the two context modules are adaptively integrated.
- Score: 77.4337867789772
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Contextual information has been shown to be powerful for semantic
segmentation. This work proposes a novel Context-based Tandem Network (CTNet)
by interactively exploring the spatial contextual information and the channel
contextual information, which can discover the semantic context for semantic
segmentation. Specifically, the Spatial Contextual Module (SCM) is leveraged to
uncover the spatial contextual dependency between pixels by exploring the
correlation between pixels and categories. Meanwhile, the Channel Contextual
Module (CCM) is introduced to learn the semantic features including the
semantic feature maps and class-specific features by modeling the long-term
semantic dependence between channels. The learned semantic features are
utilized as the prior knowledge to guide the learning of SCM, which can make
SCM obtain more accurate long-range spatial dependency. Finally, to further
improve the performance of the learned representations for semantic
segmentation, the results of the two context modules are adaptively integrated
to achieve better results. Extensive experiments are conducted on three
widely-used datasets, i.e., PASCAL-Context, ADE20K and PASCAL VOC2012. The
results demonstrate the superior performance of the proposed CTNet by
comparison with several state-of-the-art methods.
Related papers
- Text-Video Retrieval with Global-Local Semantic Consistent Learning [122.15339128463715]
We propose a simple yet effective method, Global-Local Semantic Consistent Learning (GLSCL)
GLSCL capitalizes on latent shared semantics across modalities for text-video retrieval.
Our method achieves comparable performance with SOTA as well as being nearly 220 times faster in terms of computational cost.
arXiv Detail & Related papers (2024-05-21T11:59:36Z) - Weakly-supervised Semantic Segmentation via Dual-stream Contrastive Learning of Cross-image Contextual Information [10.77139542242678]
Weakly supervised semantic segmentation (WSSS) aims at learning a semantic segmentation model with only image-level tags.
Most current WSSS methods focus on a limited single image (pixel-wise) information while ignoring the valuable inter-image (semantic-wise) information.
arXiv Detail & Related papers (2024-05-08T09:35:26Z) - Auxiliary Tasks Enhanced Dual-affinity Learning for Weakly Supervised
Semantic Segmentation [79.05949524349005]
We propose AuxSegNet+, a weakly supervised auxiliary learning framework to explore the rich information from saliency maps.
We also propose a cross-task affinity learning mechanism to learn pixel-level affinities from the saliency and segmentation feature maps.
arXiv Detail & Related papers (2024-03-02T10:03:21Z) - Context-Aware Interaction Network for RGB-T Semantic Segmentation [12.91377211747192]
RGB-T semantic segmentation is a key technique for autonomous driving scenes understanding.
We propose a Context-Aware Interaction Network (CAINet) to exploit auxiliary tasks and global context for guided learning.
The proposed CAINet achieves state-of-the-art performance on benchmark datasets.
arXiv Detail & Related papers (2024-01-03T08:49:29Z) - Open-world Semantic Segmentation via Contrasting and Clustering
Vision-Language Embedding [95.78002228538841]
We propose a new open-world semantic segmentation pipeline that makes the first attempt to learn to segment semantic objects of various open-world categories without any efforts on dense annotations.
Our method can directly segment objects of arbitrary categories, outperforming zero-shot segmentation methods that require data labeling on three benchmark datasets.
arXiv Detail & Related papers (2022-07-18T09:20:04Z) - Regional Semantic Contrast and Aggregation for Weakly Supervised
Semantic Segmentation [25.231470587575238]
We propose regional semantic contrast and aggregation (RCA) for learning semantic segmentation.
RCA is equipped with a regional memory bank to store massive, diverse object patterns appearing in training data.
RCA earns a strong capability of fine-grained semantic understanding, and eventually establishes new state-of-the-art results on two popular benchmarks.
arXiv Detail & Related papers (2022-03-17T23:29:03Z) - CRIS: CLIP-Driven Referring Image Segmentation [71.56466057776086]
We propose an end-to-end CLIP-Driven Referring Image framework (CRIS)
CRIS resorts to vision-language decoding and contrastive learning for achieving the text-to-pixel alignment.
Our proposed framework significantly outperforms the state-of-the-art performance without any post-processing.
arXiv Detail & Related papers (2021-11-30T07:29:08Z) - DCANet: Dense Context-Aware Network for Semantic Segmentation [4.960604671885823]
We propose a novel module, named Context-Aware (DCA) module, to adaptively integrate local detail information with global dependencies.
Driven by the contextual relationship, the DCA module can better achieve the aggregation of context information to generate more powerful features.
We empirically demonstrate the promising performance of our approach with extensive experiments on three challenging datasets.
arXiv Detail & Related papers (2021-04-06T14:12:22Z) - Referring Image Segmentation via Cross-Modal Progressive Comprehension [94.70482302324704]
Referring image segmentation aims at segmenting the foreground masks of the entities that can well match the description given in the natural language expression.
Previous approaches tackle this problem using implicit feature interaction and fusion between visual and linguistic modalities.
We propose a Cross-Modal Progressive (CMPC) module and a Text-Guided Feature Exchange (TGFE) module to effectively address the challenging task.
arXiv Detail & Related papers (2020-10-01T16:02:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.