RGB-T Semantic Segmentation with Location, Activation, and Sharpening
- URL: http://arxiv.org/abs/2210.14530v1
- Date: Wed, 26 Oct 2022 07:42:34 GMT
- Title: RGB-T Semantic Segmentation with Location, Activation, and Sharpening
- Authors: Gongyang Li, Yike Wang, Zhi Liu, Xinpeng Zhang, Dan Zeng
- Abstract summary: We propose a novel feature fusion-based network for RGB-T semantic segmentation, named emphLASNet, which follows three steps of location, activation, and sharpening.
Experimental results on two public datasets demonstrate that the superiority of our LASNet over relevant state-of-the-art methods.
- Score: 27.381263494613556
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Semantic segmentation is important for scene understanding. To address the
scenes of adverse illumination conditions of natural images, thermal infrared
(TIR) images are introduced. Most existing RGB-T semantic segmentation methods
follow three cross-modal fusion paradigms, i.e. encoder fusion, decoder fusion,
and feature fusion. Some methods, unfortunately, ignore the properties of RGB
and TIR features or the properties of features at different levels. In this
paper, we propose a novel feature fusion-based network for RGB-T semantic
segmentation, named \emph{LASNet}, which follows three steps of location,
activation, and sharpening. The highlight of LASNet is that we fully consider
the characteristics of cross-modal features at different levels, and
accordingly propose three specific modules for better segmentation. Concretely,
we propose a Collaborative Location Module (CLM) for high-level semantic
features, aiming to locate all potential objects. We propose a Complementary
Activation Module for middle-level features, aiming to activate exact regions
of different objects. We propose an Edge Sharpening Module (ESM) for low-level
texture features, aiming to sharpen the edges of objects. Furthermore, in the
training phase, we attach a location supervision and an edge supervision after
CLM and ESM, respectively, and impose two semantic supervisions in the decoder
part to facilitate network convergence. Experimental results on two public
datasets demonstrate that the superiority of our LASNet over relevant
state-of-the-art methods. The code and results of our method are available at
https://github.com/MathLee/LASNet.
Related papers
- SEDS: Semantically Enhanced Dual-Stream Encoder for Sign Language Retrieval [82.51117533271517]
Previous works typically only encode RGB videos to obtain high-level semantic features.
Existing RGB-based sign retrieval works suffer from the huge memory cost of dense visual data embedding in end-to-end training.
We propose a novel sign language representation framework called Semantically Enhanced Dual-Stream.
arXiv Detail & Related papers (2024-07-23T11:31:11Z) - Salient Object Detection in Optical Remote Sensing Images Driven by
Transformer [69.22039680783124]
We propose a novel Global Extraction Local Exploration Network (GeleNet) for Optical Remote Sensing Images (ORSI-SOD)
Specifically, GeleNet first adopts a transformer backbone to generate four-level feature embeddings with global long-range dependencies.
Extensive experiments on three public datasets demonstrate that the proposed GeleNet outperforms relevant state-of-the-art methods.
arXiv Detail & Related papers (2023-09-15T07:14:43Z) - LoGoNet: Towards Accurate 3D Object Detection with Local-to-Global
Cross-Modal Fusion [40.44084541717407]
novel Local-to-Global fusion network (LoGoNet)
LoGoNet ranks 1st on 3D object detection leaderboard.
For the first time, the detection performance on three classes surpasses 80 APH (L2) simultaneously.
arXiv Detail & Related papers (2023-03-07T02:00:34Z) - Interactive Context-Aware Network for RGB-T Salient Object Detection [7.544240329265388]
We propose a novel network called Interactive Context-Aware Network (ICANet)
ICANet contains three modules that can effectively perform the cross-modal and cross-scale fusions.
Experiments prove that our network performs favorably against the state-of-the-art RGB-T SOD methods.
arXiv Detail & Related papers (2022-11-11T10:04:36Z) - Adjacent Context Coordination Network for Salient Object Detection in
Optical Remote Sensing Images [102.75699068451166]
We propose a novel Adjacent Context Coordination Network (ACCoNet) to explore the coordination of adjacent features in an encoder-decoder architecture for optical RSI-SOD.
The proposed ACCoNet outperforms 22 state-of-the-art methods under nine evaluation metrics, and runs up to 81 fps on a single NVIDIA Titan X GPU.
arXiv Detail & Related papers (2022-03-25T14:14:55Z) - Edge-aware Guidance Fusion Network for RGB Thermal Scene Parsing [4.913013713982677]
We propose an edge-aware guidance fusion network (EGFNet) for RGB thermal scene parsing.
To effectively fuse the RGB and thermal information, we propose a multimodal fusion module.
Considering the importance of high level semantic information, we propose a global information module and a semantic information module.
arXiv Detail & Related papers (2021-12-09T01:12:47Z) - Specificity-preserving RGB-D Saliency Detection [103.3722116992476]
We propose a specificity-preserving network (SP-Net) for RGB-D saliency detection.
Two modality-specific networks and a shared learning network are adopted to generate individual and shared saliency maps.
Experiments on six benchmark datasets demonstrate that our SP-Net outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2021-08-18T14:14:22Z) - Boundary-Aware Geometric Encoding for Semantic Segmentation of Point
Clouds [45.270215729464056]
Boundary information plays a significant role in 2D image segmentation, while usually being ignored in 3D point cloud segmentation.
We propose a Boundary Prediction Module (BPM) to predict boundary points.
Based on the predicted boundary, a boundary-aware Geometric.
GEM is designed to encode geometric information and aggregate features with discrimination in a neighborhood.
arXiv Detail & Related papers (2021-01-07T05:38:19Z) - Improving Semantic Segmentation via Decoupled Body and Edge Supervision [89.57847958016981]
Existing semantic segmentation approaches either aim to improve the object's inner consistency by modeling the global context, or refine objects detail along their boundaries by multi-scale feature fusion.
In this paper, a new paradigm for semantic segmentation is proposed.
Our insight is that appealing performance of semantic segmentation requires textitexplicitly modeling the object textitbody and textitedge, which correspond to the high and low frequency of the image.
We show that the proposed framework with various baselines or backbone networks leads to better object inner consistency and object boundaries.
arXiv Detail & Related papers (2020-07-20T12:11:22Z) - RGB-D Salient Object Detection with Cross-Modality Modulation and
Selection [126.4462739820643]
We present an effective method to progressively integrate and refine the cross-modality complementarities for RGB-D salient object detection (SOD)
The proposed network mainly solves two challenging issues: 1) how to effectively integrate the complementary information from RGB image and its corresponding depth map, and 2) how to adaptively select more saliency-related features.
arXiv Detail & Related papers (2020-07-14T14:22:50Z) - 3D Gated Recurrent Fusion for Semantic Scene Completion [32.86736222106503]
This paper tackles the problem of data fusion in the semantic scene completion (SSC) task.
We propose a 3D gated recurrent fusion network (GRFNet), which learns to adaptively select and fuse the relevant information from depth and RGB.
Experiments on two benchmark datasets demonstrate the superior performance and the effectiveness of the proposed GRFNet for data fusion in SSC.
arXiv Detail & Related papers (2020-02-17T21:45:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.