MVTN: Multi-View Transformation Network for 3D Shape Recognition
- URL: http://arxiv.org/abs/2011.13244v3
- Date: Tue, 17 Aug 2021 15:10:28 GMT
- Title: MVTN: Multi-View Transformation Network for 3D Shape Recognition
- Authors: Abdullah Hamdi, Silvio Giancola, Bernard Ghanem
- Abstract summary: We introduce the Multi-View Transformation Network (MVTN) that regresses optimal view-points for 3D shape recognition.
MVTN can be trained end-to-end along with any multi-view network for 3D shape classification.
MVTN exhibits clear performance gains in the tasks of 3D shape classification and 3D shape retrieval without the need for extra training supervision.
- Score: 80.34385402179852
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multi-view projection methods have demonstrated their ability to reach
state-of-the-art performance on 3D shape recognition. Those methods learn
different ways to aggregate information from multiple views. However, the
camera view-points for those views tend to be heuristically set and fixed for
all shapes. To circumvent the lack of dynamism of current multi-view methods,
we propose to learn those view-points. In particular, we introduce the
Multi-View Transformation Network (MVTN) that regresses optimal view-points for
3D shape recognition, building upon advances in differentiable rendering. As a
result, MVTN can be trained end-to-end along with any multi-view network for 3D
shape classification. We integrate MVTN in a novel adaptive multi-view pipeline
that can render either 3D meshes or point clouds. MVTN exhibits clear
performance gains in the tasks of 3D shape classification and 3D shape
retrieval without the need for extra training supervision. In these tasks, MVTN
achieves state-of-the-art performance on ModelNet40, ShapeNet Core55, and the
most recent and realistic ScanObjectNN dataset (up to 6% improvement).
Interestingly, we also show that MVTN can provide network robustness against
rotation and occlusion in the 3D domain. The code is available at
https://github.com/ajhamdi/MVTN .
Related papers
- EmbodiedSAM: Online Segment Any 3D Thing in Real Time [61.2321497708998]
Embodied tasks require the agent to fully understand 3D scenes simultaneously with its exploration.
An online, real-time, fine-grained and highly-generalized 3D perception model is desperately needed.
arXiv Detail & Related papers (2024-08-21T17:57:06Z) - MVTN: Learning Multi-View Transformations for 3D Understanding [60.15214023270087]
We introduce the Multi-View Transformation Network (MVTN), which uses differentiable rendering to determine optimal view-points for 3D shape recognition.
MVTN can be trained end-to-end with any multi-view network for 3D shape recognition.
Our approach demonstrates state-of-the-art performance in 3D classification and shape retrieval on several benchmarks.
arXiv Detail & Related papers (2022-12-27T12:09:16Z) - Viewer-Centred Surface Completion for Unsupervised Domain Adaptation in
3D Object Detection [7.489722641968593]
3D detectors tend to overfit datasets they are trained on. This causes a drastic decrease in accuracy when the detectors are trained on one dataset and tested on another.
We address this in our approach, SEE-VCN, by designing a novel viewer-centred surface completion network (VCN)
With SEE-VCN, we obtain a unified representation of objects across datasets, allowing the network to focus on learning geometry, rather than overfitting on scan patterns.
arXiv Detail & Related papers (2022-09-14T04:22:20Z) - Multi-View Transformer for 3D Visual Grounding [64.30493173825234]
We propose a Multi-View Transformer (MVT) for 3D visual grounding.
We project the 3D scene to a multi-view space, in which the position information of the 3D scene under different views are modeled simultaneously and aggregated together.
arXiv Detail & Related papers (2022-04-05T12:59:43Z) - Voint Cloud: Multi-View Point Cloud Representation for 3D Understanding [80.04281842702294]
We introduce the concept of the multi-view point cloud (Voint cloud) representing each 3D point as a set of features extracted from several view-points.
This novel 3D Voint cloud representation combines the compactness of 3D point cloud representation with the natural view-awareness of multi-view representation.
We deploy a Voint neural network (VointNet) with a theoretically established functional form to learn representations in the Voint space.
arXiv Detail & Related papers (2021-11-30T13:08:19Z) - Multi-view 3D Reconstruction with Transformer [34.756336770583154]
We reformulate the multi-view 3D reconstruction as a sequence-to-sequence prediction problem.
We propose a new framework named 3D Volume Transformer (VolT) for such a task.
Our method achieves a new state-of-the-art accuracy in multi-view reconstruction with fewer parameters.
arXiv Detail & Related papers (2021-03-24T03:14:49Z) - Self-Supervised Multi-View Learning via Auto-Encoding 3D Transformations [61.870882736758624]
We propose a novel self-supervised paradigm to learn Multi-View Transformation Equivariant Representations (MV-TER)
Specifically, we perform a 3D transformation on a 3D object, and obtain multiple views before and after the transformation via projection.
Then, we self-train a representation to capture the intrinsic 3D object representation by decoding 3D transformation parameters from the fused feature representations of multiple views before and after the transformation.
arXiv Detail & Related papers (2021-03-01T06:24:17Z) - Virtual Multi-view Fusion for 3D Semantic Segmentation [11.259694096475766]
We show that our virtual views enable more effective training of 2D semantic segmentation networks than previous multiview approaches.
When the 2D per pixel predictions are aggregated on 3D surfaces, our virtual multiview fusion method is able to achieve significantly better 3D semantic segmentation results.
arXiv Detail & Related papers (2020-07-26T14:46:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.