Interactive Annotation of 3D Object Geometry using 2D Scribbles
- URL: http://arxiv.org/abs/2008.10719v2
- Date: Mon, 26 Oct 2020 02:43:19 GMT
- Title: Interactive Annotation of 3D Object Geometry using 2D Scribbles
- Authors: Tianchang Shen, Jun Gao, Amlan Kar, Sanja Fidler
- Abstract summary: In this paper, we propose an interactive framework for annotating 3D object geometry from point cloud data and RGB imagery.
Our framework targets naive users without artistic or graphics expertise.
- Score: 84.51514043814066
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Inferring detailed 3D geometry of the scene is crucial for robotics
applications, simulation, and 3D content creation. However, such information is
hard to obtain, and thus very few datasets support it. In this paper, we
propose an interactive framework for annotating 3D object geometry from both
point cloud data and RGB imagery. The key idea behind our approach is to
exploit strong priors that humans have about the 3D world in order to
interactively annotate complete 3D shapes. Our framework targets naive users
without artistic or graphics expertise. We introduce two simple-to-use
interaction modules. First, we make an automatic guess of the 3D shape and
allow the user to provide feedback about large errors by drawing scribbles in
desired 2D views. Next, we aim to correct minor errors, in which users drag and
drop mesh vertices, assisted by a neural interactive module implemented as a
Graph Convolutional Network. Experimentally, we show that only a few user
interactions are needed to produce good quality 3D shapes on popular benchmarks
such as ShapeNet, Pix3D and ScanNet. We implement our framework as a web
service and conduct a user study, where we show that user annotated data using
our method effectively facilitates real-world learning tasks. Web service:
http://www.cs.toronto.edu/~shenti11/scribble3d.
Related papers
- Weakly-Supervised 3D Scene Graph Generation via Visual-Linguistic Assisted Pseudo-labeling [9.440800948514449]
We propose a weakly-supervised 3D scene graph generation method via Visual-Linguistic Assisted Pseudo-labeling.
Our 3D-VLAP exploits the superior ability of current large-scale visual-linguistic models to align the semantics between texts and 2D images.
We design an edge self-attention based graph neural network to generate scene graphs of 3D point cloud scenes.
arXiv Detail & Related papers (2024-04-03T07:30:09Z) - Uni3D: Exploring Unified 3D Representation at Scale [66.26710717073372]
We present Uni3D, a 3D foundation model to explore the unified 3D representation at scale.
Uni3D uses a 2D ViT end-to-end pretrained to align the 3D point cloud features with the image-text aligned features.
We show that the strong Uni3D representation also enables applications such as 3D painting and retrieval in the wild.
arXiv Detail & Related papers (2023-10-10T16:49:21Z) - CLIP$^2$: Contrastive Language-Image-Point Pretraining from Real-World
Point Cloud Data [80.42480679542697]
We propose Contrastive Language-Image-Point Cloud Pretraining (CLIP$2$) to learn the transferable 3D point cloud representation in realistic scenarios.
Specifically, we exploit naturally-existed correspondences in 2D and 3D scenarios, and build well-aligned and instance-based text-image-point proxies from those complex scenarios.
arXiv Detail & Related papers (2023-03-22T09:32:45Z) - Look Around and Refer: 2D Synthetic Semantics Knowledge Distillation for
3D Visual Grounding [23.672405624011873]
We propose a module to consolidate the 3D visual stream by 2D clues synthesized from point clouds.
We empirically show their aptitude to boost the quality of the learned visual representations.
Our proposed module, dubbed as Look Around and Refer (LAR), significantly outperforms the state-of-the-art 3D visual grounding techniques on three benchmarks.
arXiv Detail & Related papers (2022-11-25T17:12:08Z) - MvDeCor: Multi-view Dense Correspondence Learning for Fine-grained 3D
Segmentation [91.6658845016214]
We propose to utilize self-supervised techniques in the 2D domain for fine-grained 3D shape segmentation tasks.
We render a 3D shape from multiple views, and set up a dense correspondence learning task within the contrastive learning framework.
As a result, the learned 2D representations are view-invariant and geometrically consistent.
arXiv Detail & Related papers (2022-08-18T00:48:15Z) - 3D-to-2D Distillation for Indoor Scene Parsing [78.36781565047656]
We present a new approach that enables us to leverage 3D features extracted from large-scale 3D data repository to enhance 2D features extracted from RGB images.
First, we distill 3D knowledge from a pretrained 3D network to supervise a 2D network to learn simulated 3D features from 2D features during the training.
Second, we design a two-stage dimension normalization scheme to calibrate the 2D and 3D features for better integration.
Third, we design a semantic-aware adversarial training model to extend our framework for training with unpaired 3D data.
arXiv Detail & Related papers (2021-04-06T02:22:24Z) - Parameter-Efficient Person Re-identification in the 3D Space [51.092669618679615]
We project 2D images to a 3D space and introduce a novel parameter-efficient Omni-scale Graph Network (OG-Net) to learn the pedestrian representation directly from 3D point clouds.
OG-Net effectively exploits the local information provided by sparse 3D points and takes advantage of the structure and appearance information in a coherent manner.
We are among the first attempts to conduct person re-identification in the 3D space.
arXiv Detail & Related papers (2020-06-08T13:20:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.