DCANet: Dense Context-Aware Network for Semantic Segmentation
- URL: http://arxiv.org/abs/2104.02533v1
- Date: Tue, 6 Apr 2021 14:12:22 GMT
- Title: DCANet: Dense Context-Aware Network for Semantic Segmentation
- Authors: Yifu Liu, Chenfeng Xu and Xinyu Jin
- Abstract summary: We propose a novel module, named Context-Aware (DCA) module, to adaptively integrate local detail information with global dependencies.
Driven by the contextual relationship, the DCA module can better achieve the aggregation of context information to generate more powerful features.
We empirically demonstrate the promising performance of our approach with extensive experiments on three challenging datasets.
- Score: 4.960604671885823
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As the superiority of context information gradually manifests in advanced
semantic segmentation, learning to capture the compact context relationship can
help to understand the complex scenes. In contrast to some previous works
utilizing the multi-scale context fusion, we propose a novel module, named
Dense Context-Aware (DCA) module, to adaptively integrate local detail
information with global dependencies. Driven by the contextual relationship,
the DCA module can better achieve the aggregation of context information to
generate more powerful features. Furthermore, we deliberately design two
extended structures based on the DCA modules to further capture the long-range
contextual dependency information. By combining the DCA modules in cascade or
parallel, our networks use a progressive strategy to improve multi-scale
feature representations for robust segmentation. We empirically demonstrate the
promising performance of our approach (DCANet) with extensive experiments on
three challenging datasets, including PASCAL VOC 2012, Cityscapes, and ADE20K.
Related papers
- Cross-domain Multi-modal Few-shot Object Detection via Rich Text [21.36633828492347]
Cross-modal feature extraction and integration have led to steady performance improvements in few-shot learning tasks.
We study the Cross-Domain few-shot generalization of MM-OD (CDMM-FSOD) and propose a meta-learning based multi-modal few-shot object detection method.
arXiv Detail & Related papers (2024-03-24T15:10:22Z) - Contextualization Distillation from Large Language Model for Knowledge
Graph Completion [51.126166442122546]
We introduce the Contextualization Distillation strategy, a plug-in-and-play approach compatible with both discriminative and generative KGC frameworks.
Our method begins by instructing large language models to transform compact, structural triplets into context-rich segments.
Comprehensive evaluations across diverse datasets and KGC techniques highlight the efficacy and adaptability of our approach.
arXiv Detail & Related papers (2024-01-28T08:56:49Z) - Context-Aware Interaction Network for RGB-T Semantic Segmentation [12.91377211747192]
RGB-T semantic segmentation is a key technique for autonomous driving scenes understanding.
We propose a Context-Aware Interaction Network (CAINet) to exploit auxiliary tasks and global context for guided learning.
The proposed CAINet achieves state-of-the-art performance on benchmark datasets.
arXiv Detail & Related papers (2024-01-03T08:49:29Z) - Context-Enhanced Detector For Building Detection From Remote Sensing Images [41.3238458718635]
We propose a novel approach called Context-Enhanced Detector (CEDet)
Our approach utilizes a three-stage cascade structure to enhance the extraction of contextual information and improve building detection accuracy.
Our method achieves state-of-the-art performance on three building detection benchmarks, including CNBuilding-9P, CNBuilding-23P, and SpaceNet.
arXiv Detail & Related papers (2023-10-11T16:33:30Z) - Multi-Grained Multimodal Interaction Network for Entity Linking [65.30260033700338]
Multimodal entity linking task aims at resolving ambiguous mentions to a multimodal knowledge graph.
We propose a novel Multi-GraIned Multimodal InteraCtion Network $textbf(MIMIC)$ framework for solving the MEL task.
arXiv Detail & Related papers (2023-07-19T02:11:19Z) - RefSAM: Efficiently Adapting Segmenting Anything Model for Referring Video Object Segmentation [53.4319652364256]
This paper presents the RefSAM model, which explores the potential of SAM for referring video object segmentation.
Our proposed approach adapts the original SAM model to enhance cross-modality learning by employing a lightweight Cross-RValModal.
We employ a parameter-efficient tuning strategy to align and fuse the language and vision features effectively.
arXiv Detail & Related papers (2023-07-03T13:21:58Z) - Support-set based Multi-modal Representation Enhancement for Video
Captioning [121.70886789958799]
We propose a Support-set based Multi-modal Representation Enhancement (SMRE) model to mine rich information in a semantic subspace shared between samples.
Specifically, we propose a Support-set Construction (SC) module to construct a support-set to learn underlying connections between samples and obtain semantic-related visual elements.
During this process, we design a Semantic Space Transformation (SST) module to constrain relative distance and administrate multi-modal interactions in a self-supervised way.
arXiv Detail & Related papers (2022-05-19T03:40:29Z) - CTNet: Context-based Tandem Network for Semantic Segmentation [77.4337867789772]
This work proposes a novel Context-based Tandem Network (CTNet) by interactively exploring the spatial contextual information and the channel contextual information.
To further improve the performance of the learned representations for semantic segmentation, the results of the two context modules are adaptively integrated.
arXiv Detail & Related papers (2021-04-20T07:33:11Z) - Referring Image Segmentation via Cross-Modal Progressive Comprehension [94.70482302324704]
Referring image segmentation aims at segmenting the foreground masks of the entities that can well match the description given in the natural language expression.
Previous approaches tackle this problem using implicit feature interaction and fusion between visual and linguistic modalities.
We propose a Cross-Modal Progressive (CMPC) module and a Text-Guided Feature Exchange (TGFE) module to effectively address the challenging task.
arXiv Detail & Related papers (2020-10-01T16:02:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.