Learning Canonical View Representation for 3D Shape Recognition with
Arbitrary Views
- URL: http://arxiv.org/abs/2108.07084v2
- Date: Tue, 17 Aug 2021 03:08:21 GMT
- Title: Learning Canonical View Representation for 3D Shape Recognition with
Arbitrary Views
- Authors: Xin Wei, Yifei Gong, Fudong Wang, Xing Sun, Jian Sun
- Abstract summary: We focus on recognizing 3D shapes from arbitrary views, i.e., arbitrary numbers and positions of viewpoints.
It is a challenging and realistic setting for view-based 3D shape recognition.
We propose a canonical view representation to tackle this challenge.
- Score: 20.42021463570052
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In this paper, we focus on recognizing 3D shapes from arbitrary views, i.e.,
arbitrary numbers and positions of viewpoints. It is a challenging and
realistic setting for view-based 3D shape recognition. We propose a canonical
view representation to tackle this challenge. We first transform the original
features of arbitrary views to a fixed number of view features, dubbed
canonical view representation, by aligning the arbitrary view features to a set
of learnable reference view features using optimal transport. In this way, each
3D shape with arbitrary views is represented by a fixed number of canonical
view features, which are further aggregated to generate a rich and robust 3D
shape representation for shape recognition. We also propose a canonical view
feature separation constraint to enforce that the view features in canonical
view representation can be embedded into scattered points in a Euclidean space.
Experiments on the ModelNet40, ScanObjectNN, and RGBD datasets show that our
method achieves competitive results under the fixed viewpoint settings, and
significantly outperforms the applicable methods under the arbitrary view
setting.
Related papers
- Beyond Viewpoint: Robust 3D Object Recognition under Arbitrary Views through Joint Multi-Part Representation [22.8031613567025]
Part-aware Network (PANet) aims to localize and understand different parts of 3D objects, such as airplane wings and tails.
Our results on benchmark datasets clearly demonstrate that our proposed method outperforms existing view-based aggregation baselines for the task of 3D object recognition under arbitrary views.
arXiv Detail & Related papers (2024-07-04T11:16:47Z) - MVTN: Learning Multi-View Transformations for 3D Understanding [60.15214023270087]
We introduce the Multi-View Transformation Network (MVTN), which uses differentiable rendering to determine optimal view-points for 3D shape recognition.
MVTN can be trained end-to-end with any multi-view network for 3D shape recognition.
Our approach demonstrates state-of-the-art performance in 3D classification and shape retrieval on several benchmarks.
arXiv Detail & Related papers (2022-12-27T12:09:16Z) - Vision Transformer for NeRF-Based View Synthesis from a Single Input
Image [49.956005709863355]
We propose to leverage both the global and local features to form an expressive 3D representation.
To synthesize a novel view, we train a multilayer perceptron (MLP) network conditioned on the learned 3D representation to perform volume rendering.
Our method can render novel views from only a single input image and generalize across multiple object categories using a single model.
arXiv Detail & Related papers (2022-07-12T17:52:04Z) - Learning Canonical 3D Object Representation for Fine-Grained Recognition [77.33501114409036]
We propose a novel framework for fine-grained object recognition that learns to recover object variation in 3D space from a single image.
We represent an object as a composition of 3D shape and its appearance, while eliminating the effect of camera viewpoint.
By incorporating 3D shape and appearance jointly in a deep representation, our method learns the discriminative representation of the object.
arXiv Detail & Related papers (2021-08-10T12:19:34Z) - CoCoNets: Continuous Contrastive 3D Scene Representations [21.906643302668716]
This paper explores self-supervised learning of amodal 3D feature representations from RGB and RGB-D posed images and videos.
We show the resulting 3D visual feature representations effectively scale across objects and scenes, imagine information occluded or missing from the input viewpoints, track objects over time, align semantically related objects in 3D, and improve 3D object detection.
arXiv Detail & Related papers (2021-04-08T15:50:47Z) - Stable View Synthesis [100.86844680362196]
We present Stable View Synthesis (SVS)
Given a set of source images depicting a scene from freely distributed viewpoints, SVS synthesizes new views of the scene.
SVS outperforms state-of-the-art view synthesis methods both quantitatively and qualitatively on three diverse real-world datasets.
arXiv Detail & Related papers (2020-11-14T07:24:43Z) - Weakly Supervised Learning of Multi-Object 3D Scene Decompositions Using
Deep Shape Priors [69.02332607843569]
PriSMONet is a novel approach for learning Multi-Object 3D scene decomposition and representations from single images.
A recurrent encoder regresses a latent representation of 3D shape, pose and texture of each object from an input RGB image.
We evaluate the accuracy of our model in inferring 3D scene layout, demonstrate its generative capabilities, assess its generalization to real images, and point out benefits of the learned representation.
arXiv Detail & Related papers (2020-10-08T14:49:23Z) - Object-Centric Multi-View Aggregation [86.94544275235454]
We present an approach for aggregating a sparse set of views of an object in order to compute a semi-implicit 3D representation in the form of a volumetric feature grid.
Key to our approach is an object-centric canonical 3D coordinate system into which views can be lifted, without explicit camera pose estimation.
We show that computing a symmetry-aware mapping from pixels to the canonical coordinate system allows us to better propagate information to unseen regions.
arXiv Detail & Related papers (2020-07-20T17:38:31Z) - AUTO3D: Novel view synthesis through unsupervisely learned variational
viewpoint and global 3D representation [27.163052958878776]
This paper targets on learning-based novel view synthesis from a single or limited 2D images without the pose supervision.
We construct an end-to-end trainable conditional variational framework to disentangle the unsupervisely learned relative-pose/rotation and implicit global 3D representation.
Our system can achieve implicitly 3D understanding without explicitly 3D reconstruction.
arXiv Detail & Related papers (2020-07-13T18:51:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.