MonoPlace3D: Learning 3D-Aware Object Placement for 3D Monocular Detection
- URL: http://arxiv.org/abs/2504.06801v2
- Date: Thu, 10 Apr 2025 05:03:02 GMT
- Title: MonoPlace3D: Learning 3D-Aware Object Placement for 3D Monocular Detection
- Authors: Rishubh Parihar, Srinjay Sarkar, Sarthak Vora, Jogendra Kundu, R. Venkatesh Babu,
- Abstract summary: Current monocular 3D detectors are held back by the limited diversity and scale of real-world datasets.<n>We show that where and how objects are positioned is just as crucial for training effective 3D monocular detectors.<n>We introduce MonoPlace3D, a novel system that considers the 3D scene content to create realistic augmentations.
- Score: 24.611196145977022
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Current monocular 3D detectors are held back by the limited diversity and scale of real-world datasets. While data augmentation certainly helps, it's particularly difficult to generate realistic scene-aware augmented data for outdoor settings. Most current approaches to synthetic data generation focus on realistic object appearance through improved rendering techniques. However, we show that where and how objects are positioned is just as crucial for training effective 3D monocular detectors. The key obstacle lies in automatically determining realistic object placement parameters - including position, dimensions, and directional alignment when introducing synthetic objects into actual scenes. To address this, we introduce MonoPlace3D, a novel system that considers the 3D scene content to create realistic augmentations. Specifically, given a background scene, MonoPlace3D learns a distribution over plausible 3D bounding boxes. Subsequently, we render realistic objects and place them according to the locations sampled from the learned distribution. Our comprehensive evaluation on two standard datasets KITTI and NuScenes, demonstrates that MonoPlace3D significantly improves the accuracy of multiple existing monocular 3D detectors while being highly data efficient.
Related papers
- V-MIND: Building Versatile Monocular Indoor 3D Detector with Diverse 2D Annotations [17.49394091283978]
V-MIND (Versatile Monocular INdoor Detector) enhances the performance of indoor 3D detectors across a diverse set of object classes.<n>We generate 3D training data by converting large-scale 2D images into 3D point clouds and subsequently deriving pseudo 3D bounding boxes.<n>V-MIND achieves state-of-the-art object detection performance across a wide range of classes on the Omni3D indoor dataset.
arXiv Detail & Related papers (2024-12-16T03:28:00Z) - SceneWiz3D: Towards Text-guided 3D Scene Composition [134.71933134180782]
Existing approaches either leverage large text-to-image models to optimize a 3D representation or train 3D generators on object-centric datasets.
We introduce SceneWiz3D, a novel approach to synthesize high-fidelity 3D scenes from text.
arXiv Detail & Related papers (2023-12-13T18:59:30Z) - 3D Copy-Paste: Physically Plausible Object Insertion for Monocular 3D
Detection [35.61749990140511]
A major challenge in monocular 3D object detection is the limited diversity and quantity of objects in real datasets.
We propose a physically plausible indoor 3D object insertion approach to automatically copy virtual objects and paste them into real scenes.
arXiv Detail & Related papers (2023-12-08T08:44:54Z) - BerfScene: Bev-conditioned Equivariant Radiance Fields for Infinite 3D
Scene Generation [96.58789785954409]
We propose a practical and efficient 3D representation that incorporates an equivariant radiance field with the guidance of a bird's-eye view map.
We produce large-scale, even infinite-scale, 3D scenes via synthesizing local scenes and then stitching them with smooth consistency.
arXiv Detail & Related papers (2023-12-04T18:56:10Z) - NeurOCS: Neural NOCS Supervision for Monocular 3D Object Localization [80.3424839706698]
We present NeurOCS, a framework that uses instance masks 3D boxes as input to learn 3D object shapes by means of differentiable rendering.
Our approach rests on insights in learning a category-level shape prior directly from real driving scenes.
We make critical design choices to learn object coordinates more effectively from an object-centric view.
arXiv Detail & Related papers (2023-05-28T16:18:41Z) - OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic
Perception, Reconstruction and Generation [107.71752592196138]
We propose OmniObject3D, a large vocabulary 3D object dataset with massive high-quality real-scanned 3D objects.
It comprises 6,000 scanned objects in 190 daily categories, sharing common classes with popular 2D datasets.
Each 3D object is captured with both 2D and 3D sensors, providing textured meshes, point clouds, multiview rendered images, and multiple real-captured videos.
arXiv Detail & Related papers (2023-01-18T18:14:18Z) - Learning 3D Scene Priors with 2D Supervision [37.79852635415233]
We propose a new method to learn 3D scene priors of layout and shape without requiring any 3D ground truth.
Our method represents a 3D scene as a latent vector, from which we can progressively decode to a sequence of objects characterized by their class categories.
Experiments on 3D-FRONT and ScanNet show that our method outperforms state of the art in single-view reconstruction.
arXiv Detail & Related papers (2022-11-25T15:03:32Z) - RandomRooms: Unsupervised Pre-training from Synthetic Shapes and
Randomized Layouts for 3D Object Detection [138.2892824662943]
A promising solution is to make better use of the synthetic dataset, which consists of CAD object models, to boost the learning on real datasets.
Recent work on 3D pre-training exhibits failure when transfer features learned on synthetic objects to other real-world applications.
In this work, we put forward a new method called RandomRooms to accomplish this objective.
arXiv Detail & Related papers (2021-08-17T17:56:12Z) - Kinematic 3D Object Detection in Monocular Video [123.7119180923524]
We propose a novel method for monocular video-based 3D object detection which carefully leverages kinematic motion to improve precision of 3D localization.
We achieve state-of-the-art performance on monocular 3D object detection and the Bird's Eye View tasks within the KITTI self-driving dataset.
arXiv Detail & Related papers (2020-07-19T01:15:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.