2D-3D Geometric Fusion Network using Multi-Neighbourhood Graph
Convolution for RGB-D Indoor Scene Classification
- URL: http://arxiv.org/abs/2009.11154v3
- Date: Thu, 27 May 2021 10:06:33 GMT
- Title: 2D-3D Geometric Fusion Network using Multi-Neighbourhood Graph
Convolution for RGB-D Indoor Scene Classification
- Authors: Albert Mosella-Montoro, Javier Ruiz-Hidalgo
- Abstract summary: This paper presents a 2D-3D Fusion stage that combines 3D Geometric Features with 2D Texture Features.
Experimental results, using NYU-Depth-V2 and SUN RGB-D datasets, show that the proposed method outperforms the current state-of-the-art in RGB-D indoor scene classification task.
- Score: 0.8629912408966145
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Multi-modal fusion has been proved to help enhance the performance of scene
classification tasks. This paper presents a 2D-3D Fusion stage that combines 3D
Geometric Features with 2D Texture Features obtained by 2D Convolutional Neural
Networks. To get a robust 3D Geometric embedding, a network that uses two novel
layers is proposed. The first layer, Multi-Neighbourhood Graph Convolution,
aims to learn a more robust geometric descriptor of the scene combining two
different neighbourhoods: one in the Euclidean space and the other in the
Feature space. The second proposed layer, Nearest Voxel Pooling, improves the
performance of the well-known Voxel Pooling. Experimental results, using
NYU-Depth-V2 and SUN RGB-D datasets, show that the proposed method outperforms
the current state-of-the-art in RGB-D indoor scene classification task.
Related papers
- NDC-Scene: Boost Monocular 3D Semantic Scene Completion in Normalized
Device Coordinates Space [77.6067460464962]
Monocular 3D Semantic Scene Completion (SSC) has garnered significant attention in recent years due to its potential to predict complex semantics and geometry shapes from a single image, requiring no 3D inputs.
We identify several critical issues in current state-of-the-art methods, including the Feature Ambiguity of projected 2D features in the ray to the 3D space, the Pose Ambiguity of the 3D convolution, and the Imbalance in the 3D convolution across different depth levels.
We devise a novel Normalized Device Coordinates scene completion network (NDC-Scene) that directly extends the 2
arXiv Detail & Related papers (2023-09-26T02:09:52Z) - GraphCSPN: Geometry-Aware Depth Completion via Dynamic GCNs [49.55919802779889]
We propose a Graph Convolution based Spatial Propagation Network (GraphCSPN) as a general approach for depth completion.
In this work, we leverage convolution neural networks as well as graph neural networks in a complementary way for geometric representation learning.
Our method achieves the state-of-the-art performance, especially when compared in the case of using only a few propagation steps.
arXiv Detail & Related papers (2022-10-19T17:56:03Z) - 3D Dense Face Alignment with Fused Features by Aggregating CNNs and GCNs [28.7443367565456]
This is achieved by seamlessly combining standard convolutional neural networks (CNNs) with Graph Convolution Networks (GCNs)
By iteratively fusing the features across different layers and stages of the CNNs and GCNs, our approach can provide a dense face alignment and 3D face reconstruction simultaneously.
Experiments on several challenging datasets demonstrate that our method outperforms state-of-the-art approaches on both 2D and 3D face alignment tasks.
arXiv Detail & Related papers (2022-03-09T11:07:10Z) - Laplacian2Mesh: Laplacian-Based Mesh Understanding [4.808061174740482]
We introduce a novel and flexible convolutional neural network (CNN) model, called Laplacian2Mesh, for 3D triangle mesh.
Mesh pooling is applied to expand the receptive field of the network by the multi-space transformation of Laplacian.
Experiments on various learning tasks applied to 3D meshes demonstrate the effectiveness and efficiency of Laplacian2Mesh.
arXiv Detail & Related papers (2022-02-01T10:10:13Z) - Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR-based
Perception [122.53774221136193]
State-of-the-art methods for driving-scene LiDAR-based perception often project the point clouds to 2D space and then process them via 2D convolution.
A natural remedy is to utilize the 3D voxelization and 3D convolution network.
We propose a new framework for the outdoor LiDAR segmentation, where cylindrical partition and asymmetrical 3D convolution networks are designed to explore the 3D geometric pattern.
arXiv Detail & Related papers (2021-09-12T06:25:11Z) - Multi-Modality Task Cascade for 3D Object Detection [22.131228757850373]
Many methods train two models in isolation and use simple feature concatenation to represent 3D sensor data.
We propose a novel Multi-Modality Task Cascade network (MTC-RCNN) that leverages 3D box proposals to improve 2D segmentation predictions.
We show that including a 2D network between two stages of 3D modules significantly improves both 2D and 3D task performance.
arXiv Detail & Related papers (2021-07-08T17:55:01Z) - 3D-to-2D Distillation for Indoor Scene Parsing [78.36781565047656]
We present a new approach that enables us to leverage 3D features extracted from large-scale 3D data repository to enhance 2D features extracted from RGB images.
First, we distill 3D knowledge from a pretrained 3D network to supervise a 2D network to learn simulated 3D features from 2D features during the training.
Second, we design a two-stage dimension normalization scheme to calibrate the 2D and 3D features for better integration.
Third, we design a semantic-aware adversarial training model to extend our framework for training with unpaired 3D data.
arXiv Detail & Related papers (2021-04-06T02:22:24Z) - Learning Joint 2D-3D Representations for Depth Completion [90.62843376586216]
We design a simple yet effective neural network block that learns to extract joint 2D and 3D features.
Specifically, the block consists of two domain-specific sub-networks that apply 2D convolution on image pixels and continuous convolution on 3D points.
arXiv Detail & Related papers (2020-12-22T22:58:29Z) - Cross-Modality 3D Object Detection [63.29935886648709]
We present a novel two-stage multi-modal fusion network for 3D object detection.
The whole architecture facilitates two-stage fusion.
Our experiments on the KITTI dataset show that the proposed multi-stage fusion helps the network to learn better representations.
arXiv Detail & Related papers (2020-08-16T11:01:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.