FEANet: Feature-Enhanced Attention Network for RGB-Thermal Real-time
Semantic Segmentation
- URL: http://arxiv.org/abs/2110.08988v1
- Date: Mon, 18 Oct 2021 02:43:41 GMT
- Title: FEANet: Feature-Enhanced Attention Network for RGB-Thermal Real-time
Semantic Segmentation
- Authors: Fuqin Deng, Hua Feng, Mingjian Liang, Hongmin Wang, Yong Yang, Yuan
Gao, Junfeng Chen, Junjie Hu, Xiyue Guo, and Tin Lun Lam
- Abstract summary: We propose a two-stage Feature-Enhanced Attention Network (FEANet) for the RGB-T semantic segmentation task.
Specifically, we introduce a Feature-Enhanced Attention Module (FEAM) to excavate and enhance multi-level features from both the channel and spatial views.
Benefited from the proposed FEAM module, our FEANet can preserve the spatial information and shift more attention to high-resolution features from the fused RGB-T images.
- Score: 19.265576529259647
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The RGB-Thermal (RGB-T) information for semantic segmentation has been
extensively explored in recent years. However, most existing RGB-T semantic
segmentation usually compromises spatial resolution to achieve real-time
inference speed, which leads to poor performance. To better extract detail
spatial information, we propose a two-stage Feature-Enhanced Attention Network
(FEANet) for the RGB-T semantic segmentation task. Specifically, we introduce a
Feature-Enhanced Attention Module (FEAM) to excavate and enhance multi-level
features from both the channel and spatial views. Benefited from the proposed
FEAM module, our FEANet can preserve the spatial information and shift more
attention to high-resolution features from the fused RGB-T images. Extensive
experiments on the urban scene dataset demonstrate that our FEANet outperforms
other state-of-the-art (SOTA) RGB-T methods in terms of objective metrics and
subjective visual comparison (+2.6% in global mAcc and +0.8% in global mIoU).
For the 480 x 640 RGB-T test images, our FEANet can run with a real-time speed
on an NVIDIA GeForce RTX 2080 Ti card.
Related papers
- RGB-D Video Object Segmentation via Enhanced Multi-store Feature Memory [34.406308400305385]
RGB-Depth (RGB-D) Video Object (VOS) aims to integrate the fine-grained texture information of RGB with the geometric clues of depth modality.
In this paper, we propose a novel RGB-D VOS via multi-store feature memory for robust segmentation.
We show that the proposed method state-of-the-art performance on the latest RGB-D VOS benchmark.
arXiv Detail & Related papers (2025-04-23T07:31:37Z) - KAN-SAM: Kolmogorov-Arnold Network Guided Segment Anything Model for RGB-T Salient Object Detection [35.52055285209549]
We propose a novel prompt learning-based RGB-T SOD method, named KAN-SAM, which reveals the potential of visual foundational models for RGB-T SOD tasks.
Specifically, we extend Segment Anything Model 2 (SAM2) for RGB-T SOD by introducing thermal features as guiding prompts through efficient and accurate Kolmogorov-Arnold Network (KAN) adapters.
We also introduce a mutually exclusive random masking strategy to reduce reliance on RGB data and improve generalization.
arXiv Detail & Related papers (2025-04-08T10:07:02Z) - Diffusion-based RGB-D Semantic Segmentation with Deformable Attention Transformer [10.982521876026281]
We introduce a diffusion-based framework to address the RGB-D semantic segmentation problem.
We demonstrate that utilizing a Deformable Attention Transformer as the encoder to extract features from depth images effectively captures the characteristics of invalid regions in depth measurements.
arXiv Detail & Related papers (2024-09-23T15:23:01Z) - RGB-Sonar Tracking Benchmark and Spatial Cross-Attention Transformer Tracker [4.235252053339947]
This paper introduces a new challenging RGB-Sonar (RGB-S) tracking task.
It investigates how to achieve efficient tracking of an underwater target through the interaction of RGB and sonar modalities.
arXiv Detail & Related papers (2024-06-11T12:01:11Z) - TENet: Targetness Entanglement Incorporating with Multi-Scale Pooling and Mutually-Guided Fusion for RGB-E Object Tracking [30.89375068036783]
Existing approaches perform event feature extraction for RGB-E tracking using traditional appearance models.
We propose an Event backbone (Pooler) to obtain a high-quality feature representation that is cognisant of the intrinsic characteristics of the event data.
Our method significantly outperforms state-of-the-art trackers on two widely used RGB-E tracking datasets.
arXiv Detail & Related papers (2024-05-08T12:19:08Z) - Optimizing rgb-d semantic segmentation through multi-modal interaction
and pooling attention [5.518612382697244]
Multi-modal Interaction and Pooling Attention Network (MIPANet) is designed to harness the interactive synergy between RGB and depth modalities.
We introduce a Pooling Attention Module (PAM) at various stages of the encoder.
This module serves to amplify the features extracted by the network and integrates the module's output into the decoder.
arXiv Detail & Related papers (2023-11-19T12:25:59Z) - Symmetric Uncertainty-Aware Feature Transmission for Depth
Super-Resolution [52.582632746409665]
We propose a novel Symmetric Uncertainty-aware Feature Transmission (SUFT) for color-guided DSR.
Our method achieves superior performance compared to state-of-the-art methods.
arXiv Detail & Related papers (2023-06-01T06:35:59Z) - Spherical Space Feature Decomposition for Guided Depth Map
Super-Resolution [123.04455334124188]
Guided depth map super-resolution (GDSR) aims to upsample low-resolution (LR) depth maps with additional information involved in high-resolution (HR) RGB images from the same scene.
In this paper, we propose the Spherical Space feature Decomposition Network (SSDNet) to solve the above issues.
Our method can achieve state-of-the-art results on four test datasets, as well as successfully generalize to real-world scenes.
arXiv Detail & Related papers (2023-03-15T21:22:21Z) - Cross-modality Discrepant Interaction Network for RGB-D Salient Object
Detection [78.47767202232298]
We propose a novel Cross-modality Discrepant Interaction Network (CDINet) for RGB-D SOD.
Two components are designed to implement the effective cross-modality interaction.
Our network outperforms $15$ state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2021-08-04T11:24:42Z) - MobileSal: Extremely Efficient RGB-D Salient Object Detection [62.04876251927581]
This paper introduces a novel network, methodname, which focuses on efficient RGB-D salient object detection (SOD)
We propose an implicit depth restoration (IDR) technique to strengthen the feature representation capability of mobile networks for RGB-D SOD.
With IDR and CPR incorporated, methodnameperforms favorably against sArt methods on seven challenging RGB-D SOD datasets.
arXiv Detail & Related papers (2020-12-24T04:36:42Z) - Efficient RGB-D Semantic Segmentation for Indoor Scene Analysis [16.5390740005143]
We propose an efficient and robust RGB-D segmentation approach that can be optimized to a high degree using NVIDIART.
We show that RGB-D segmentation is superior to processing RGB images solely and that it can still be performed in real time if the network architecture is carefully designed.
arXiv Detail & Related papers (2020-11-13T15:17:31Z) - Siamese Network for RGB-D Salient Object Detection and Beyond [113.30063105890041]
A novel framework is proposed to learn from both RGB and depth inputs through a shared network backbone.
Comprehensive experiments using five popular metrics show that the designed framework yields a robust RGB-D saliency detector.
We also link JL-DCF to the RGB-D semantic segmentation field, showing its capability of outperforming several semantic segmentation models.
arXiv Detail & Related papers (2020-08-26T06:01:05Z) - Bi-directional Cross-Modality Feature Propagation with
Separation-and-Aggregation Gate for RGB-D Semantic Segmentation [59.94819184452694]
Depth information has proven to be a useful cue in the semantic segmentation of RGBD images for providing a geometric counterpart to the RGB representation.
Most existing works simply assume that depth measurements are accurate and well-aligned with the RGB pixels and models the problem as a cross-modal feature fusion.
In this paper, we propose a unified and efficient Crossmodality Guided to not only effectively recalibrate RGB feature responses, but also to distill accurate depth information via multiple stages and aggregate the two recalibrated representations alternatively.
arXiv Detail & Related papers (2020-07-17T18:35:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.