Robust Double-Encoder Network for RGB-D Panoptic Segmentation
- URL: http://arxiv.org/abs/2210.02834v2
- Date: Wed, 14 Jun 2023 14:29:19 GMT
- Title: Robust Double-Encoder Network for RGB-D Panoptic Segmentation
- Authors: Matteo Sodano, Federico Magistri, Tiziano Guadagnino, Jens Behley,
Cyrill Stachniss
- Abstract summary: Panoptic segmentation provides an interpretation of the scene by computing a pixelwise semantic label together with instance IDs.
We propose a novel encoder-decoder neural network that processes RGB and depth separately through two encoders.
We show that our approach achieves superior results compared to other common approaches for panoptic segmentation.
- Score: 31.807572107839576
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Perception is crucial for robots that act in real-world environments, as
autonomous systems need to see and understand the world around them to act
properly. Panoptic segmentation provides an interpretation of the scene by
computing a pixelwise semantic label together with instance IDs. In this paper,
we address panoptic segmentation using RGB-D data of indoor scenes. We propose
a novel encoder-decoder neural network that processes RGB and depth separately
through two encoders. The features of the individual encoders are progressively
merged at different resolutions, such that the RGB features are enhanced using
complementary depth information. We propose a novel merging approach called
ResidualExcite, which reweighs each entry of the feature map according to its
importance. With our double-encoder architecture, we are robust to missing
cues. In particular, the same model can train and infer on RGB-D, RGB-only, and
depth-only input data, without the need to train specialized models. We
evaluate our method on publicly available datasets and show that our approach
achieves superior results compared to other common approaches for panoptic
segmentation.
Related papers
- Diffusion-based RGB-D Semantic Segmentation with Deformable Attention Transformer [10.982521876026281]
We introduce a diffusion-based framework to address the RGB-D semantic segmentation problem.
We demonstrate that utilizing a Deformable Attention Transformer as the encoder to extract features from depth images effectively captures the characteristics of invalid regions in depth measurements.
arXiv Detail & Related papers (2024-09-23T15:23:01Z) - SEDS: Semantically Enhanced Dual-Stream Encoder for Sign Language Retrieval [82.51117533271517]
Previous works typically only encode RGB videos to obtain high-level semantic features.
Existing RGB-based sign retrieval works suffer from the huge memory cost of dense visual data embedding in end-to-end training.
We propose a novel sign language representation framework called Semantically Enhanced Dual-Stream.
arXiv Detail & Related papers (2024-07-23T11:31:11Z) - CIR-Net: Cross-modality Interaction and Refinement for RGB-D Salient
Object Detection [144.66411561224507]
We present a convolutional neural network (CNN) model, named CIR-Net, based on the novel cross-modality interaction and refinement.
Our network outperforms the state-of-the-art saliency detectors both qualitatively and quantitatively.
arXiv Detail & Related papers (2022-10-06T11:59:19Z) - Dual Swin-Transformer based Mutual Interactive Network for RGB-D Salient
Object Detection [67.33924278729903]
In this work, we propose Dual Swin-Transformer based Mutual Interactive Network.
We adopt Swin-Transformer as the feature extractor for both RGB and depth modality to model the long-range dependencies in visual inputs.
Comprehensive experiments on five standard RGB-D SOD benchmark datasets demonstrate the superiority of the proposed DTMINet method.
arXiv Detail & Related papers (2022-06-07T08:35:41Z) - Attention-based Dual Supervised Decoder for RGBD Semantic Segmentation [16.721758280029302]
We propose a novel attention-based dual supervised decoder for RGBD semantic segmentation.
In the encoder, we design a simple yet effective attention-based multimodal fusion module to extract and fuse deeply multi-level paired complementary information.
Our method achieves superior performance against the state-of-the-art methods.
arXiv Detail & Related papers (2022-01-05T03:12:27Z) - RGB-D Saliency Detection via Cascaded Mutual Information Minimization [122.8879596830581]
Existing RGB-D saliency detection models do not explicitly encourage RGB and depth to achieve effective multi-modal learning.
We introduce a novel multi-stage cascaded learning framework via mutual information minimization to "explicitly" model the multi-modal information between RGB image and depth data.
arXiv Detail & Related papers (2021-09-15T12:31:27Z) - Cross-modality Discrepant Interaction Network for RGB-D Salient Object
Detection [78.47767202232298]
We propose a novel Cross-modality Discrepant Interaction Network (CDINet) for RGB-D SOD.
Two components are designed to implement the effective cross-modality interaction.
Our network outperforms $15$ state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2021-08-04T11:24:42Z) - Bi-directional Cross-Modality Feature Propagation with
Separation-and-Aggregation Gate for RGB-D Semantic Segmentation [59.94819184452694]
Depth information has proven to be a useful cue in the semantic segmentation of RGBD images for providing a geometric counterpart to the RGB representation.
Most existing works simply assume that depth measurements are accurate and well-aligned with the RGB pixels and models the problem as a cross-modal feature fusion.
In this paper, we propose a unified and efficient Crossmodality Guided to not only effectively recalibrate RGB feature responses, but also to distill accurate depth information via multiple stages and aggregate the two recalibrated representations alternatively.
arXiv Detail & Related papers (2020-07-17T18:35:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.