3D Copy-Paste: Physically Plausible Object Insertion for Monocular 3D
Detection
- URL: http://arxiv.org/abs/2312.05277v1
- Date: Fri, 8 Dec 2023 08:44:54 GMT
- Title: 3D Copy-Paste: Physically Plausible Object Insertion for Monocular 3D
Detection
- Authors: Yunhao Ge, Hong-Xing Yu, Cheng Zhao, Yuliang Guo, Xinyu Huang, Liu
Ren, Laurent Itti, Jiajun Wu
- Abstract summary: A major challenge in monocular 3D object detection is the limited diversity and quantity of objects in real datasets.
We propose a physically plausible indoor 3D object insertion approach to automatically copy virtual objects and paste them into real scenes.
- Score: 35.61749990140511
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A major challenge in monocular 3D object detection is the limited diversity
and quantity of objects in real datasets. While augmenting real scenes with
virtual objects holds promise to improve both the diversity and quantity of the
objects, it remains elusive due to the lack of an effective 3D object insertion
method in complex real captured scenes. In this work, we study augmenting
complex real indoor scenes with virtual objects for monocular 3D object
detection. The main challenge is to automatically identify plausible physical
properties for virtual assets (e.g., locations, appearances, sizes, etc.) in
cluttered real scenes. To address this challenge, we propose a physically
plausible indoor 3D object insertion approach to automatically copy virtual
objects and paste them into real scenes. The resulting objects in scenes have
3D bounding boxes with plausible physical locations and appearances. In
particular, our method first identifies physically feasible locations and poses
for the inserted objects to prevent collisions with the existing room layout.
Subsequently, it estimates spatially-varying illumination for the insertion
location, enabling the immersive blending of the virtual objects into the
original scene with plausible appearances and cast shadows. We show that our
augmentation method significantly improves existing monocular 3D object models
and achieves state-of-the-art performance. For the first time, we demonstrate
that a physically plausible 3D object insertion, serving as a generative data
augmentation technique, can lead to significant improvements for discriminative
downstream tasks such as monocular 3D object detection. Project website:
https://gyhandy.github.io/3D-Copy-Paste/
Related papers
- Lay-A-Scene: Personalized 3D Object Arrangement Using Text-to-Image Priors [43.19801974707858]
Current 3D generation techniques struggle with generating scenes with multiple high-resolution objects.
Here we introduce Lay-A-Scene, which solves the task of Open-set 3D Object Arrangement.
We show how to infer the 3D poses and arrangement of objects from a 2D generated image by finding a consistent projection of objects onto the 2D scene.
arXiv Detail & Related papers (2024-06-02T09:48:19Z) - Zero-Shot Multi-Object Scene Completion [59.325611678171974]
We present a 3D scene completion method that recovers the complete geometry of multiple unseen objects in complex scenes from a single RGB-D image.
Our method outperforms the current state-of-the-art on both synthetic and real-world datasets.
arXiv Detail & Related papers (2024-03-21T17:59:59Z) - SceneWiz3D: Towards Text-guided 3D Scene Composition [134.71933134180782]
Existing approaches either leverage large text-to-image models to optimize a 3D representation or train 3D generators on object-centric datasets.
We introduce SceneWiz3D, a novel approach to synthesize high-fidelity 3D scenes from text.
arXiv Detail & Related papers (2023-12-13T18:59:30Z) - NeurOCS: Neural NOCS Supervision for Monocular 3D Object Localization [80.3424839706698]
We present NeurOCS, a framework that uses instance masks 3D boxes as input to learn 3D object shapes by means of differentiable rendering.
Our approach rests on insights in learning a category-level shape prior directly from real driving scenes.
We make critical design choices to learn object coordinates more effectively from an object-centric view.
arXiv Detail & Related papers (2023-05-28T16:18:41Z) - Monocular 3D Object Detection using Multi-Stage Approaches with
Attention and Slicing aided hyper inference [0.0]
3D object detection is vital as it would enable us to capture objects' sizes, orientation, and position in the world.
We would be able to use this 3D detection in real-world applications such as Augmented Reality (AR), self-driving cars, and robotics.
arXiv Detail & Related papers (2022-12-22T15:36:07Z) - D3D-HOI: Dynamic 3D Human-Object Interactions from Videos [49.38319295373466]
We introduce D3D-HOI: a dataset of monocular videos with ground truth annotations of 3D object pose, shape and part motion during human-object interactions.
Our dataset consists of several common articulated objects captured from diverse real-world scenes and camera viewpoints.
We leverage the estimated 3D human pose for more accurate inference of the object spatial layout and dynamics.
arXiv Detail & Related papers (2021-08-19T00:49:01Z) - Object Wake-up: 3-D Object Reconstruction, Animation, and in-situ
Rendering from a Single Image [58.69732754597448]
Given a picture of a chair, could we extract the 3-D shape of the chair, animate its plausible articulations and motions, and render in-situ in its original image space?
We devise an automated approach to extract and manipulate articulated objects in single images.
arXiv Detail & Related papers (2021-08-05T16:20:12Z) - 3D Object Recognition By Corresponding and Quantizing Neural 3D Scene
Representations [29.61554189447989]
We propose a system that learns to detect objects and infer their 3D poses in RGB-D images.
Many existing systems can identify objects and infer 3D poses, but they heavily rely on human labels and 3D annotations.
arXiv Detail & Related papers (2020-10-30T13:56:09Z) - Kinematic 3D Object Detection in Monocular Video [123.7119180923524]
We propose a novel method for monocular video-based 3D object detection which carefully leverages kinematic motion to improve precision of 3D localization.
We achieve state-of-the-art performance on monocular 3D object detection and the Bird's Eye View tasks within the KITTI self-driving dataset.
arXiv Detail & Related papers (2020-07-19T01:15:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.