3DLatNav: Navigating Generative Latent Spaces for Semantic-Aware 3D
Object Manipulation
- URL: http://arxiv.org/abs/2211.09770v1
- Date: Thu, 17 Nov 2022 18:47:56 GMT
- Title: 3DLatNav: Navigating Generative Latent Spaces for Semantic-Aware 3D
Object Manipulation
- Authors: Amaya Dharmasiri, Dinithi Dissanayake, Mohamed Afham, Isuru
Dissanayake, Ranga Rodrigo, Kanchana Thilakarathna
- Abstract summary: 3D generative models have been recently successful in generating realistic 3D objects in the form of point clouds.
Most models do not offer controllability to manipulate the shape semantics of component object parts without extensive semantic labels or other reference point clouds.
We propose 3DLatNav; a novel approach to navigating pretrained generative latent spaces to enable controlled part-level semantic manipulation of 3D objects.
- Score: 2.8661021832561757
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: 3D generative models have been recently successful in generating realistic 3D
objects in the form of point clouds. However, most models do not offer
controllability to manipulate the shape semantics of component object parts
without extensive semantic attribute labels or other reference point clouds.
Moreover, beyond the ability to perform simple latent vector arithmetic or
interpolations, there is a lack of understanding of how part-level semantics of
3D shapes are encoded in their corresponding generative latent spaces. In this
paper, we propose 3DLatNav; a novel approach to navigating pretrained
generative latent spaces to enable controlled part-level semantic manipulation
of 3D objects. First, we propose a part-level weakly-supervised shape semantics
identification mechanism using latent representations of 3D shapes. Then, we
transfer that knowledge to a pretrained 3D object generative latent space to
unravel disentangled embeddings to represent different shape semantics of
component parts of an object in the form of linear subspaces, despite the
unavailability of part-level labels during the training. Finally, we utilize
those identified subspaces to show that controllable 3D object part
manipulation can be achieved by applying the proposed framework to any
pretrained 3D generative model. With two novel quantitative metrics to evaluate
the consistency and localization accuracy of part-level manipulations, we show
that 3DLatNav outperforms existing unsupervised latent disentanglement methods
in identifying latent directions that encode part-level shape semantics of 3D
objects. With multiple ablation studies and testing on state-of-the-art
generative models, we show that 3DLatNav can implement controlled part-level
semantic manipulations on an input point cloud while preserving other features
and the realistic nature of the object.
Related papers
- Learning 3D Representations from Procedural 3D Programs [6.915871213703219]
Self-supervised learning has emerged as a promising approach for acquiring transferable 3D representations from unlabeled 3D point clouds.
We propose learning 3D representations from procedural 3D programs that automatically generate 3D shapes using simple primitives and augmentations.
arXiv Detail & Related papers (2024-11-25T18:59:57Z) - CUS3D :CLIP-based Unsupervised 3D Segmentation via Object-level Denoise [9.12768731317489]
We propose a novel distillation learning framework named CUS3D.
An object-level denosing projection module is designed to screen out the noise'' and ensure more accurate 3D feature.
Based on the obtained features, a multimodal distillation learning module is designed to align the 3D feature with CLIP semantic feature space.
arXiv Detail & Related papers (2024-09-21T02:17:35Z) - SUGAR: Pre-training 3D Visual Representations for Robotics [85.55534363501131]
We introduce a novel 3D pre-training framework for robotics named SUGAR.
SUGAR captures semantic, geometric and affordance properties of objects through 3D point clouds.
We show that SUGAR's 3D representation outperforms state-of-the-art 2D and 3D representations.
arXiv Detail & Related papers (2024-04-01T21:23:03Z) - 3D Semantic Subspace Traverser: Empowering 3D Generative Model with
Shape Editing Capability [13.041974495083197]
Previous studies on 3D shape generation have focused on shape quality and structure, without or less considering the importance of semantic information.
We propose a novel semantic generative model named 3D Semantic Subspace Traverser.
Our method can produce plausible shapes with complex structures and enable the editing of semantic attributes.
arXiv Detail & Related papers (2023-07-26T09:04:27Z) - NeurOCS: Neural NOCS Supervision for Monocular 3D Object Localization [80.3424839706698]
We present NeurOCS, a framework that uses instance masks 3D boxes as input to learn 3D object shapes by means of differentiable rendering.
Our approach rests on insights in learning a category-level shape prior directly from real driving scenes.
We make critical design choices to learn object coordinates more effectively from an object-centric view.
arXiv Detail & Related papers (2023-05-28T16:18:41Z) - Object-level 3D Semantic Mapping using a Network of Smart Edge Sensors [25.393382192511716]
We extend a multi-view 3D semantic mapping system consisting of a network of distributed edge sensors with object-level information.
Our method is evaluated on the public Behave dataset where it shows pose estimation within a few centimeters and in real-world experiments with the sensor network in a challenging lab environment.
arXiv Detail & Related papers (2022-11-21T11:13:08Z) - ConDor: Self-Supervised Canonicalization of 3D Pose for Partial Shapes [55.689763519293464]
ConDor is a self-supervised method that learns to canonicalize the 3D orientation and position for full and partial 3D point clouds.
During inference, our method takes an unseen full or partial 3D point cloud at an arbitrary pose and outputs an equivariant canonical pose.
arXiv Detail & Related papers (2022-01-19T18:57:21Z) - Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR-based
Perception [122.53774221136193]
State-of-the-art methods for driving-scene LiDAR-based perception often project the point clouds to 2D space and then process them via 2D convolution.
A natural remedy is to utilize the 3D voxelization and 3D convolution network.
We propose a new framework for the outdoor LiDAR segmentation, where cylindrical partition and asymmetrical 3D convolution networks are designed to explore the 3D geometric pattern.
arXiv Detail & Related papers (2021-09-12T06:25:11Z) - Learning Canonical 3D Object Representation for Fine-Grained Recognition [77.33501114409036]
We propose a novel framework for fine-grained object recognition that learns to recover object variation in 3D space from a single image.
We represent an object as a composition of 3D shape and its appearance, while eliminating the effect of camera viewpoint.
By incorporating 3D shape and appearance jointly in a deep representation, our method learns the discriminative representation of the object.
arXiv Detail & Related papers (2021-08-10T12:19:34Z) - SESS: Self-Ensembling Semi-Supervised 3D Object Detection [138.80825169240302]
We propose SESS, a self-ensembling semi-supervised 3D object detection framework. Specifically, we design a thorough perturbation scheme to enhance generalization of the network on unlabeled and new unseen data.
Our SESS achieves competitive performance compared to the state-of-the-art fully-supervised method by using only 50% labeled data.
arXiv Detail & Related papers (2019-12-26T08:48:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.