Project-and-Fuse: Improving RGB-D Semantic Segmentation via Graph Convolution Networks
- URL: http://arxiv.org/abs/2501.18851v1
- Date: Fri, 31 Jan 2025 02:24:13 GMT
- Title: Project-and-Fuse: Improving RGB-D Semantic Segmentation via Graph Convolution Networks
- Authors: Xiaoyan Jiang, Bohan Wang, Xinlong Wan, Zhi Zhou, Hamido Fujita,
- Abstract summary: We propose to fuse features from two modalities in a late fusion style, during which the geometric feature injection is guided by texture feature prior.
At the 3D feature extraction stage, we argue that traditional CNNs are not efficient enough for depth maps.
At projection matrix generation stage, we find the existence of Biased-Assignment and Ambiguous-Locality issues in the original pipeline.
- Score: 21.713293775719414
- License:
- Abstract: Most existing RGB-D semantic segmentation methods focus on the feature level fusion, including complex cross-modality and cross-scale fusion modules. However, these methods may cause misalignment problem in the feature fusion process and counter-intuitive patches in the segmentation results. Inspired by the popular pixel-node-pixel pipeline, we propose to 1) fuse features from two modalities in a late fusion style, during which the geometric feature injection is guided by texture feature prior; 2) employ Graph Neural Networks (GNNs) on the fused feature to alleviate the emergence of irregular patches by inferring patch relationship. At the 3D feature extraction stage, we argue that traditional CNNs are not efficient enough for depth maps. So, we encode depth map into normal map, after which CNNs can easily extract object surface tendencies.At projection matrix generation stage, we find the existence of Biased-Assignment and Ambiguous-Locality issues in the original pipeline. Therefore, we propose to 1) adopt the Kullback-Leibler Loss to ensure no missing important pixel features, which can be viewed as hard pixel mining process; 2) connect regions that are close to each other in the Euclidean space as well as in the semantic space with larger edge weights so that location informations can been considered. Extensive experiments on two public datasets, NYU-DepthV2 and SUN RGB-D, have shown that our approach can consistently boost the performance of RGB-D semantic segmentation task.
Related papers
- Differentiable Registration of Images and LiDAR Point Clouds with
VoxelPoint-to-Pixel Matching [58.10418136917358]
Cross-modality registration between 2D images from cameras and 3D point clouds from LiDARs is a crucial task in computer vision and robotic training.
Previous methods estimate 2D-3D correspondences by matching point and pixel patterns learned by neural networks.
We learn a structured cross-modality matching solver to represent 3D features via a different latent pixel space.
arXiv Detail & Related papers (2023-12-07T05:46:10Z) - LiDAR-Camera Panoptic Segmentation via Geometry-Consistent and
Semantic-Aware Alignment [63.83894701779067]
We propose LCPS, the first LiDAR-Camera Panoptic network.
In our approach, we conduct LiDAR-Camera fusion in three stages.
Our fusion strategy improves about 6.9% PQ performance over the LiDAR-only baseline on NuScenes dataset.
arXiv Detail & Related papers (2023-08-03T10:57:58Z) - Pyramid Deep Fusion Network for Two-Hand Reconstruction from RGB-D Images [11.100398985633754]
We propose an end-to-end framework for recovering dense meshes for both hands.
Our framework employs ResNet50 and PointNet++ to derive features from RGB and point cloud.
We also introduce a novel pyramid deep fusion network (PDFNet) to aggregate features at different scales.
arXiv Detail & Related papers (2023-07-12T09:33:21Z) - HODINet: High-Order Discrepant Interaction Network for RGB-D Salient
Object Detection [4.007827908611563]
RGB-D salient object detection (SOD) aims to detect the prominent regions by jointly modeling RGB and depth information.
Most RGB-D SOD methods apply the same type of backbones and fusion modules to identically learn the multimodality and multistage features.
In this paper, we propose a high-order discrepant interaction network (HODINet) for RGB-D SOD.
arXiv Detail & Related papers (2023-07-03T11:56:21Z) - Flattening-Net: Deep Regular 2D Representation for 3D Point Cloud
Analysis [66.49788145564004]
We present an unsupervised deep neural architecture called Flattening-Net to represent irregular 3D point clouds of arbitrary geometry and topology.
Our methods perform favorably against the current state-of-the-art competitors.
arXiv Detail & Related papers (2022-12-17T15:05:25Z) - GraphCSPN: Geometry-Aware Depth Completion via Dynamic GCNs [49.55919802779889]
We propose a Graph Convolution based Spatial Propagation Network (GraphCSPN) as a general approach for depth completion.
In this work, we leverage convolution neural networks as well as graph neural networks in a complementary way for geometric representation learning.
Our method achieves the state-of-the-art performance, especially when compared in the case of using only a few propagation steps.
arXiv Detail & Related papers (2022-10-19T17:56:03Z) - Bi-directional Cross-Modality Feature Propagation with
Separation-and-Aggregation Gate for RGB-D Semantic Segmentation [59.94819184452694]
Depth information has proven to be a useful cue in the semantic segmentation of RGBD images for providing a geometric counterpart to the RGB representation.
Most existing works simply assume that depth measurements are accurate and well-aligned with the RGB pixels and models the problem as a cross-modal feature fusion.
In this paper, we propose a unified and efficient Crossmodality Guided to not only effectively recalibrate RGB feature responses, but also to distill accurate depth information via multiple stages and aggregate the two recalibrated representations alternatively.
arXiv Detail & Related papers (2020-07-17T18:35:24Z) - Saliency Enhancement using Gradient Domain Edges Merging [65.90255950853674]
We develop a method to merge the edges with the saliency maps to improve the performance of the saliency.
This leads to our proposed saliency enhancement using edges (SEE) with an average improvement of at least 3.4 times higher on the DUT-OMRON dataset.
The SEE algorithm is split into 2 parts, SEE-Pre for preprocessing and SEE-Post pour postprocessing.
arXiv Detail & Related papers (2020-02-11T14:04:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.