Context-Aware Interaction Network for RGB-T Semantic Segmentation
- URL: http://arxiv.org/abs/2401.01624v1
- Date: Wed, 3 Jan 2024 08:49:29 GMT
- Title: Context-Aware Interaction Network for RGB-T Semantic Segmentation
- Authors: Ying Lv, Zhi Liu, Gongyang Li
- Abstract summary: RGB-T semantic segmentation is a key technique for autonomous driving scenes understanding.
We propose a Context-Aware Interaction Network (CAINet) to exploit auxiliary tasks and global context for guided learning.
The proposed CAINet achieves state-of-the-art performance on benchmark datasets.
- Score: 12.91377211747192
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: RGB-T semantic segmentation is a key technique for autonomous driving scenes
understanding. For the existing RGB-T semantic segmentation methods, however,
the effective exploration of the complementary relationship between different
modalities is not implemented in the information interaction between multiple
levels. To address such an issue, the Context-Aware Interaction Network
(CAINet) is proposed for RGB-T semantic segmentation, which constructs
interaction space to exploit auxiliary tasks and global context for explicitly
guided learning. Specifically, we propose a Context-Aware Complementary
Reasoning (CACR) module aimed at establishing the complementary relationship
between multimodal features with the long-term context in both spatial and
channel dimensions. Further, considering the importance of global contextual
and detailed information, we propose the Global Context Modeling (GCM) module
and Detail Aggregation (DA) module, and we introduce specific auxiliary
supervision to explicitly guide the context interaction and refine the
segmentation map. Extensive experiments on two benchmark datasets of MFNet and
PST900 demonstrate that the proposed CAINet achieves state-of-the-art
performance. The code is available at https://github.com/YingLv1106/CAINet.
Related papers
- Text-Video Retrieval with Global-Local Semantic Consistent Learning [122.15339128463715]
We propose a simple yet effective method, Global-Local Semantic Consistent Learning (GLSCL)
GLSCL capitalizes on latent shared semantics across modalities for text-video retrieval.
Our method achieves comparable performance with SOTA as well as being nearly 220 times faster in terms of computational cost.
arXiv Detail & Related papers (2024-05-21T11:59:36Z) - Optimizing rgb-d semantic segmentation through multi-modal interaction
and pooling attention [5.518612382697244]
Multi-modal Interaction and Pooling Attention Network (MIPANet) is designed to harness the interactive synergy between RGB and depth modalities.
We introduce a Pooling Attention Module (PAM) at various stages of the encoder.
This module serves to amplify the features extracted by the network and integrates the module's output into the decoder.
arXiv Detail & Related papers (2023-11-19T12:25:59Z) - Multi-Grained Multimodal Interaction Network for Entity Linking [65.30260033700338]
Multimodal entity linking task aims at resolving ambiguous mentions to a multimodal knowledge graph.
We propose a novel Multi-GraIned Multimodal InteraCtion Network $textbf(MIMIC)$ framework for solving the MEL task.
arXiv Detail & Related papers (2023-07-19T02:11:19Z) - CTNet: Context-based Tandem Network for Semantic Segmentation [77.4337867789772]
This work proposes a novel Context-based Tandem Network (CTNet) by interactively exploring the spatial contextual information and the channel contextual information.
To further improve the performance of the learned representations for semantic segmentation, the results of the two context modules are adaptively integrated.
arXiv Detail & Related papers (2021-04-20T07:33:11Z) - DCANet: Dense Context-Aware Network for Semantic Segmentation [4.960604671885823]
We propose a novel module, named Context-Aware (DCA) module, to adaptively integrate local detail information with global dependencies.
Driven by the contextual relationship, the DCA module can better achieve the aggregation of context information to generate more powerful features.
We empirically demonstrate the promising performance of our approach with extensive experiments on three challenging datasets.
arXiv Detail & Related papers (2021-04-06T14:12:22Z) - Global-Local Propagation Network for RGB-D Semantic Segmentation [12.710923449138434]
We propose Global-Local propagation network (GLPNet) to solve this problem.
Our GLPNet achieves new state-of-the-art performance on two challenging indoor scene segmentation datasets.
arXiv Detail & Related papers (2021-01-26T14:26:07Z) - Referring Image Segmentation via Cross-Modal Progressive Comprehension [94.70482302324704]
Referring image segmentation aims at segmenting the foreground masks of the entities that can well match the description given in the natural language expression.
Previous approaches tackle this problem using implicit feature interaction and fusion between visual and linguistic modalities.
We propose a Cross-Modal Progressive (CMPC) module and a Text-Guided Feature Exchange (TGFE) module to effectively address the challenging task.
arXiv Detail & Related papers (2020-10-01T16:02:30Z) - Bidirectional Graph Reasoning Network for Panoptic Segmentation [126.06251745669107]
We introduce a Bidirectional Graph Reasoning Network (BGRNet) to mine the intra-modular and intermodular relations within and between foreground things and background stuff classes.
BGRNet first constructs image-specific graphs in both instance and semantic segmentation branches that enable flexible reasoning at the proposal level and class level.
arXiv Detail & Related papers (2020-04-14T02:32:10Z) - Cascaded Human-Object Interaction Recognition [175.60439054047043]
We introduce a cascade architecture for a multi-stage, coarse-to-fine HOI understanding.
At each stage, an instance localization network progressively refines HOI proposals and feeds them into an interaction recognition network.
With our carefully-designed human-centric relation features, these two modules work collaboratively towards effective interaction understanding.
arXiv Detail & Related papers (2020-03-09T17:05:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.