Scale Invariant Semantic Segmentation with RGB-D Fusion
- URL: http://arxiv.org/abs/2204.04679v1
- Date: Sun, 10 Apr 2022 12:54:27 GMT
- Title: Scale Invariant Semantic Segmentation with RGB-D Fusion
- Authors: Mohammad Dawud Ansari, Alwi Husada and Didier Stricker
- Abstract summary: We propose a neural network architecture for scale-invariant semantic segmentation using RGB-D images.
We incorporate depth information to the RGB data for pixel-wise semantic segmentation to address the different scale objects in an outdoor scene.
Our model is compact and can be easily applied to the other RGB model.
- Score: 12.650574326251023
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In this paper, we propose a neural network architecture for scale-invariant
semantic segmentation using RGB-D images. We utilize depth information as an
additional modality apart from color images only. Especially in an outdoor
scene which consists of different scale objects due to the distance of the
objects from the camera. The near distance objects consist of significantly
more pixels than the far ones. We propose to incorporate depth information to
the RGB data for pixel-wise semantic segmentation to address the different
scale objects in an outdoor scene. We adapt to a well-known
DeepLab-v2(ResNet-101) model as our RGB baseline. Depth images are passed
separately as an additional input with a distinct branch. The intermediate
feature maps of both color and depth image branch are fused using a novel
fusion block. Our model is compact and can be easily applied to the other RGB
model. We perform extensive qualitative and quantitative evaluation on a
challenging dataset Cityscapes. The results obtained are comparable to the
state-of-the-art. Additionally, we evaluated our model on a self-recorded real
dataset. For the shake of extended evaluation of a driving scene with ground
truth we generated a synthetic dataset using popular vehicle simulation project
CARLA. The results obtained from the real and synthetic dataset shows the
effectiveness of our approach.
Related papers
- RBF Weighted Hyper-Involution for RGB-D Object Detection [0.0]
We propose a real-time and two stream RGBD object detection model.
The proposed model consists of two new components: a depth guided hyper-involution that adapts dynamically based on the spatial interaction pattern in the raw depth map and an up-sampling based trainable fusion layer.
We show that the proposed model outperforms other RGB-D based object detection models on NYU Depth v2 dataset and achieves comparable (second best) results on SUN RGB-D.
arXiv Detail & Related papers (2023-09-30T11:25:34Z) - RGB-based Category-level Object Pose Estimation via Decoupled Metric
Scale Recovery [72.13154206106259]
We propose a novel pipeline that decouples the 6D pose and size estimation to mitigate the influence of imperfect scales on rigid transformations.
Specifically, we leverage a pre-trained monocular estimator to extract local geometric information.
A separate branch is designed to directly recover the metric scale of the object based on category-level statistics.
arXiv Detail & Related papers (2023-09-19T02:20:26Z) - MV-ROPE: Multi-view Constraints for Robust Category-level Object Pose and Size Estimation [23.615122326731115]
We propose a novel solution that makes use of RGB video streams.
Our framework consists of three modules: a scale-aware monocular dense SLAM solution, a lightweight object pose predictor, and an object-level pose graph.
Our experimental results demonstrate that when utilizing public dataset sequences with high-quality depth information, the proposed method exhibits comparable performance to state-of-the-art RGB-D methods.
arXiv Detail & Related papers (2023-08-17T08:29:54Z) - Clothes Grasping and Unfolding Based on RGB-D Semantic Segmentation [21.950751953721817]
We propose a novel Bi-directional Fractal Cross Fusion Network (BiFCNet) for semantic segmentation.
We use RGB images with rich color features as input to our network in which the Fractal Cross Fusion module fuses RGB and depth data.
To reduce the cost of real data collection, we propose a data augmentation method based on an adversarial strategy.
arXiv Detail & Related papers (2023-05-05T03:21:55Z) - SPSN: Superpixel Prototype Sampling Network for RGB-D Salient Object
Detection [5.2134203335146925]
RGB-D salient object detection (SOD) has been in the spotlight recently because it is an important preprocessing operation for various vision tasks.
Despite advances in deep learning-based methods, RGB-D SOD is still challenging due to the large domain gap between an RGB image and the depth map and low-quality depth maps.
We propose a novel superpixel prototype sampling network architecture to solve this problem.
arXiv Detail & Related papers (2022-07-16T10:43:14Z) - Colored Point Cloud to Image Alignment [15.828285556159026]
We introduce a differential optimization method that aligns a colored point cloud to a given color image via iterative geometric and color matching.
We find the transformation between the camera image and the point cloud colors by iterating between matching the relative location of the point cloud and matching colors.
arXiv Detail & Related papers (2021-10-07T08:12:56Z) - RGB-D Saliency Detection via Cascaded Mutual Information Minimization [122.8879596830581]
Existing RGB-D saliency detection models do not explicitly encourage RGB and depth to achieve effective multi-modal learning.
We introduce a novel multi-stage cascaded learning framework via mutual information minimization to "explicitly" model the multi-modal information between RGB image and depth data.
arXiv Detail & Related papers (2021-09-15T12:31:27Z) - Refer-it-in-RGBD: A Bottom-up Approach for 3D Visual Grounding in RGBD
Images [69.5662419067878]
Grounding referring expressions in RGBD image has been an emerging field.
We present a novel task of 3D visual grounding in single-view RGBD image where the referred objects are often only partially scanned due to occlusion.
Our approach first fuses the language and the visual features at the bottom level to generate a heatmap that localizes the relevant regions in the RGBD image.
Then our approach conducts an adaptive feature learning based on the heatmap and performs the object-level matching with another visio-linguistic fusion to finally ground the referred object.
arXiv Detail & Related papers (2021-03-14T11:18:50Z) - Learning RGB-D Feature Embeddings for Unseen Object Instance
Segmentation [67.88276573341734]
We propose a new method for unseen object instance segmentation by learning RGB-D feature embeddings from synthetic data.
A metric learning loss function is utilized to learn to produce pixel-wise feature embeddings.
We further improve the segmentation accuracy with a new two-stage clustering algorithm.
arXiv Detail & Related papers (2020-07-30T00:23:07Z) - Bi-directional Cross-Modality Feature Propagation with
Separation-and-Aggregation Gate for RGB-D Semantic Segmentation [59.94819184452694]
Depth information has proven to be a useful cue in the semantic segmentation of RGBD images for providing a geometric counterpart to the RGB representation.
Most existing works simply assume that depth measurements are accurate and well-aligned with the RGB pixels and models the problem as a cross-modal feature fusion.
In this paper, we propose a unified and efficient Crossmodality Guided to not only effectively recalibrate RGB feature responses, but also to distill accurate depth information via multiple stages and aggregate the two recalibrated representations alternatively.
arXiv Detail & Related papers (2020-07-17T18:35:24Z) - EPOS: Estimating 6D Pose of Objects with Symmetries [57.448933686429825]
We present a new method for estimating the 6D pose of rigid objects with available 3D models from a single RGB input.
An object is represented by compact surface fragments which allow symmetries in a systematic manner.
Correspondences between densely sampled pixels and the fragments are predicted using an encoder-decoder network.
arXiv Detail & Related papers (2020-04-01T17:41:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.