CAD-Estate: Large-scale CAD Model Annotation in RGB Videos
- URL: http://arxiv.org/abs/2306.09011v2
- Date: Mon, 14 Aug 2023 12:16:53 GMT
- Title: CAD-Estate: Large-scale CAD Model Annotation in RGB Videos
- Authors: Kevis-Kokitsi Maninis, Stefan Popov, Matthias Nie{\ss}ner, Vittorio
Ferrari
- Abstract summary: We propose a method for annotating videos of complex multi-object scenes with a globally-consistent 3D representation of the objects.
We annotate each object with a CAD model from a database, and place it in the 3D coordinate frame of the scene with a 9-DoF pose transformation.
Our method is semi-automatic and works on commonly-available RGB videos, without requiring a depth sensor.
- Score: 34.63782303927944
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a method for annotating videos of complex multi-object scenes with
a globally-consistent 3D representation of the objects. We annotate each object
with a CAD model from a database, and place it in the 3D coordinate frame of
the scene with a 9-DoF pose transformation. Our method is semi-automatic and
works on commonly-available RGB videos, without requiring a depth sensor. Many
steps are performed automatically, and the tasks performed by humans are
simple, well-specified, and require only limited reasoning in 3D. This makes
them feasible for crowd-sourcing and has allowed us to construct a large-scale
dataset by annotating real-estate videos from YouTube. Our dataset CAD-Estate
offers 101k instances of 12k unique CAD models placed in the 3D representations
of 20k videos. In comparison to Scan2CAD, the largest existing dataset with CAD
model annotations on real scenes, CAD-Estate has 7x more instances and 4x more
unique CAD models. We showcase the benefits of pre-training a Mask2CAD model on
CAD-Estate for the task of automatic 3D object reconstruction and pose
estimation, demonstrating that it leads to performance improvements on the
popular Scan2CAD benchmark. The dataset is available at
https://github.com/google-research/cad-estate.
Related papers
- CAD-Recode: Reverse Engineering CAD Code from Point Clouds [12.864274930732055]
3D CAD reverse engineering consists of reconstructing the sketch and CAD operation sequences from 3D representations such as point clouds.
The proposed CAD-Recode translates a point cloud into Python code that, when executed, reconstructs the CAD model.
We show that our CAD Python code output is interpretable by off-the-shelf LLMs, enabling CAD editing and CAD-specific question answering from point clouds.
arXiv Detail & Related papers (2024-12-18T16:55:42Z) - Img2CAD: Conditioned 3D CAD Model Generation from Single Image with Structured Visual Geometry [12.265852643914439]
We present Img2CAD, the first knowledge that uses 2D image inputs to generate editable parameters.
Img2CAD enables seamless integration between AI 3D reconstruction and CAD representation.
arXiv Detail & Related papers (2024-10-04T13:27:52Z) - CAT3D: Create Anything in 3D with Multi-View Diffusion Models [87.80820708758317]
We present CAT3D, a method for creating anything in 3D by simulating this real-world capture process with a multi-view diffusion model.
CAT3D can create entire 3D scenes in as little as one minute, and outperforms existing methods for single image and few-view 3D scene creation.
arXiv Detail & Related papers (2024-05-16T17:59:05Z) - FastCAD: Real-Time CAD Retrieval and Alignment from Scans and Videos [4.36478623815937]
FastCAD is a real-time method that simultaneously retrieves and aligns CAD models for all objects in a given scene.
Our single-stage method accelerates the inference time by a factor of 50 compared to other methods operating on RGB-D scans.
This enables the real-time generation of precise CAD model-based reconstructions from videos at 10 FPS.
arXiv Detail & Related papers (2024-03-22T12:20:23Z) - Model2Scene: Learning 3D Scene Representation via Contrastive
Language-CAD Models Pre-training [105.3421541518582]
Current successful methods of 3D scene perception rely on the large-scale annotated point cloud.
We propose Model2Scene, a novel paradigm that learns free 3D scene representation from Computer-Aided Design (CAD) models and languages.
Model2Scene yields impressive label-free 3D object salient detection with an average mAP of 46.08% and 55.49% on the ScanNet and S3DIS datasets, respectively.
arXiv Detail & Related papers (2023-09-29T03:51:26Z) - 3D-LLM: Injecting the 3D World into Large Language Models [60.43823088804661]
Large language models (LLMs) and Vision-Language Models (VLMs) have been proven to excel at multiple tasks, such as commonsense reasoning.
We propose to inject the 3D world into large language models and introduce a new family of 3D-LLMs.
Specifically, 3D-LLMs can take 3D point clouds and their features as input and perform a diverse set of 3D-related tasks.
arXiv Detail & Related papers (2023-07-24T17:59:02Z) - Visual Localization using Imperfect 3D Models from the Internet [54.731309449883284]
This paper studies how imperfections in 3D models affect localization accuracy.
We show that 3D models from the Internet show promise as an easy-to-obtain scene representation.
arXiv Detail & Related papers (2023-04-12T16:15:05Z) - Unsupervised Volumetric Animation [54.52012366520807]
We propose a novel approach for unsupervised 3D animation of non-rigid deformable objects.
Our method learns the 3D structure and dynamics of objects solely from single-view RGB videos.
We show our model can obtain animatable 3D objects from a single volume or few images.
arXiv Detail & Related papers (2023-01-26T18:58:54Z) - PvDeConv: Point-Voxel Deconvolution for Autoencoding CAD Construction in
3D [23.87757211847093]
We learn to synthesize high-resolution point clouds of 10k points that densely describe the underlying geometry of Computer Aided Design (CAD) models.
We introduce a new dedicated dataset, the CC3D, containing 50k+ pairs of CAD models and their corresponding 3D meshes.
This dataset is used to learn a convolutional autoencoder for point clouds sampled from the pairs of 3D scans - CAD models.
arXiv Detail & Related papers (2021-01-12T14:14:13Z) - Mask2CAD: 3D Shape Prediction by Learning to Segment and Retrieve [54.054575408582565]
We propose to leverage existing large-scale datasets of 3D models to understand the underlying 3D structure of objects seen in an image.
We present Mask2CAD, which jointly detects objects in real-world images and for each detected object, optimize for the most similar CAD model and its pose.
This produces a clean, lightweight representation of the objects in an image.
arXiv Detail & Related papers (2020-07-26T00:08:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.