HM3D-ABO: A Photo-realistic Dataset for Object-centric Multi-view 3D
Reconstruction
- URL: http://arxiv.org/abs/2206.12356v1
- Date: Fri, 24 Jun 2022 16:02:01 GMT
- Title: HM3D-ABO: A Photo-realistic Dataset for Object-centric Multi-view 3D
Reconstruction
- Authors: Zhenpei Yang, Zaiwei Zhang, Qixing Huang
- Abstract summary: We present a photo-realistic object-centric dataset HM3D-ABO.
It is constructed by composing realistic indoor scene and realistic object.
The dataset could also be useful for tasks such as camera pose estimation and novel-view synthesis.
- Score: 37.29140654256627
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reconstructing 3D objects is an important computer vision task that has wide
application in AR/VR. Deep learning algorithm developed for this task usually
relies on an unrealistic synthetic dataset, such as ShapeNet and Things3D. On
the other hand, existing real-captured object-centric datasets usually do not
have enough annotation to enable supervised training or reliable evaluation. In
this technical report, we present a photo-realistic object-centric dataset
HM3D-ABO. It is constructed by composing realistic indoor scene and realistic
object. For each configuration, we provide multi-view RGB observations, a
water-tight mesh model for the object, ground truth depth map and object mask.
The proposed dataset could also be useful for tasks such as camera pose
estimation and novel-view synthesis. The dataset generation code is released at
https://github.com/zhenpeiyang/HM3D-ABO.
Related papers
- Zero-Shot Multi-Object Scene Completion [59.325611678171974]
We present a 3D scene completion method that recovers the complete geometry of multiple unseen objects in complex scenes from a single RGB-D image.
Our method outperforms the current state-of-the-art on both synthetic and real-world datasets.
arXiv Detail & Related papers (2024-03-21T17:59:59Z) - Anything-3D: Towards Single-view Anything Reconstruction in the Wild [61.090129285205805]
We introduce Anything-3D, a methodical framework that ingeniously combines a series of visual-language models and the Segment-Anything object segmentation model.
Our approach employs a BLIP model to generate textural descriptions, utilize the Segment-Anything model for the effective extraction of objects of interest, and leverages a text-to-image diffusion model to lift object into a neural radiance field.
arXiv Detail & Related papers (2023-04-19T16:39:51Z) - OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic
Perception, Reconstruction and Generation [107.71752592196138]
We propose OmniObject3D, a large vocabulary 3D object dataset with massive high-quality real-scanned 3D objects.
It comprises 6,000 scanned objects in 190 daily categories, sharing common classes with popular 2D datasets.
Each 3D object is captured with both 2D and 3D sensors, providing textured meshes, point clouds, multiview rendered images, and multiple real-captured videos.
arXiv Detail & Related papers (2023-01-18T18:14:18Z) - A Real World Dataset for Multi-view 3D Reconstruction [28.298548207213468]
We present a dataset of 371 3D models of everyday tabletop objects along with their 320,000 real world RGB and depth images.
We primarily focus on learned multi-view 3D reconstruction due to the lack of appropriate real world benchmark for the task and demonstrate that our dataset can fill that gap.
arXiv Detail & Related papers (2022-03-22T00:15:54Z) - ABO: Dataset and Benchmarks for Real-World 3D Object Understanding [43.42504014918771]
Amazon-Berkeley Objects (ABO) is a large-scale dataset of product images and 3D models corresponding to real household objects.
We use ABO to measure the domain gap for single-view 3D reconstruction networks trained on synthetic objects.
We also use multi-view images from ABO to measure the robustness of state-of-the-art metric learning approaches to different camera viewpoints.
arXiv Detail & Related papers (2021-10-12T17:52:42Z) - D3D-HOI: Dynamic 3D Human-Object Interactions from Videos [49.38319295373466]
We introduce D3D-HOI: a dataset of monocular videos with ground truth annotations of 3D object pose, shape and part motion during human-object interactions.
Our dataset consists of several common articulated objects captured from diverse real-world scenes and camera viewpoints.
We leverage the estimated 3D human pose for more accurate inference of the object spatial layout and dynamics.
arXiv Detail & Related papers (2021-08-19T00:49:01Z) - RandomRooms: Unsupervised Pre-training from Synthetic Shapes and
Randomized Layouts for 3D Object Detection [138.2892824662943]
A promising solution is to make better use of the synthetic dataset, which consists of CAD object models, to boost the learning on real datasets.
Recent work on 3D pre-training exhibits failure when transfer features learned on synthetic objects to other real-world applications.
In this work, we put forward a new method called RandomRooms to accomplish this objective.
arXiv Detail & Related papers (2021-08-17T17:56:12Z) - Unsupervised Learning of 3D Object Categories from Videos in the Wild [75.09720013151247]
We focus on learning a model from multiple views of a large collection of object instances.
We propose a new neural network design, called warp-conditioned ray embedding (WCR), which significantly improves reconstruction.
Our evaluation demonstrates performance improvements over several deep monocular reconstruction baselines on existing benchmarks.
arXiv Detail & Related papers (2021-03-30T17:57:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.