DM-OSVP++: One-Shot View Planning Using 3D Diffusion Models for Active RGB-Based Object Reconstruction
- URL: http://arxiv.org/abs/2504.11674v1
- Date: Wed, 16 Apr 2025 00:14:52 GMT
- Title: DM-OSVP++: One-Shot View Planning Using 3D Diffusion Models for Active RGB-Based Object Reconstruction
- Authors: Sicong Pan, Liren Jin, Xuying Huang, Cyrill Stachniss, Marija Popović, Maren Bennewitz,
- Abstract summary: One-shot view planning enables efficient data collection by predicting all views at once.<n>By conditioning on initial multi-view images, we exploit the priors from the 3D diffusion model to generate an approximate object model.<n>We validate the proposed active object reconstruction system through both simulation and real-world experiments.
- Score: 24.44253219419552
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Active object reconstruction is crucial for many robotic applications. A key aspect in these scenarios is generating object-specific view configurations to obtain informative measurements for reconstruction. One-shot view planning enables efficient data collection by predicting all views at once, eliminating the need for time-consuming online replanning. Our primary insight is to leverage the generative power of 3D diffusion models as valuable prior information. By conditioning on initial multi-view images, we exploit the priors from the 3D diffusion model to generate an approximate object model, serving as the foundation for our view planning. Our novel approach integrates the geometric and textural distributions of the object model into the view planning process, generating views that focus on the complex parts of the object to be reconstructed. We validate the proposed active object reconstruction system through both simulation and real-world experiments, demonstrating the effectiveness of using 3D diffusion priors for one-shot view planning.
Related papers
- Aether: Geometric-Aware Unified World Modeling [49.33579903601599]
Aether is a unified framework that enables geometry-aware reasoning in world models.<n>It achieves zero-shot generalization in both action following and reconstruction tasks.<n>We hope our work inspires the community to explore new frontiers in physically-reasonable world modeling.
arXiv Detail & Related papers (2025-03-24T17:59:51Z) - Exploiting Priors from 3D Diffusion Models for RGB-Based One-Shot View Planning [24.44253219419552]
We propose a novel one-shot view planning approach that utilizes the powerful 3D generation capabilities of diffusion models as priors.
Our experiments in simulation and real-world setups indicate that our approach balances well between object reconstruction quality and movement cost.
arXiv Detail & Related papers (2024-03-25T14:21:49Z) - Towards Unified 3D Object Detection via Algorithm and Data Unification [70.27631528933482]
We build the first unified multi-modal 3D object detection benchmark MM- Omni3D and extend the aforementioned monocular detector to its multi-modal version.
We name the designed monocular and multi-modal detectors as UniMODE and MM-UniMODE, respectively.
arXiv Detail & Related papers (2024-02-28T18:59:31Z) - Bag of Views: An Appearance-based Approach to Next-Best-View Planning
for 3D Reconstruction [3.637651065605852]
Bag-of-Views (BoV) is a fully appearance-based model used to assign utility to captured views.
View Planning Toolbox (VPT) is a lightweight package for training and testing machine learning-based view planning frameworks.
arXiv Detail & Related papers (2023-07-11T22:56:55Z) - Anything-3D: Towards Single-view Anything Reconstruction in the Wild [61.090129285205805]
We introduce Anything-3D, a methodical framework that ingeniously combines a series of visual-language models and the Segment-Anything object segmentation model.
Our approach employs a BLIP model to generate textural descriptions, utilize the Segment-Anything model for the effective extraction of objects of interest, and leverages a text-to-image diffusion model to lift object into a neural radiance field.
arXiv Detail & Related papers (2023-04-19T16:39:51Z) - DETR4D: Direct Multi-View 3D Object Detection with Sparse Attention [50.11672196146829]
3D object detection with surround-view images is an essential task for autonomous driving.
We propose DETR4D, a Transformer-based framework that explores sparse attention and direct feature query for 3D object detection in multi-view images.
arXiv Detail & Related papers (2022-12-15T14:18:47Z) - MegaPose: 6D Pose Estimation of Novel Objects via Render & Compare [84.80956484848505]
MegaPose is a method to estimate the 6D pose of novel objects, that is, objects unseen during training.
We present a 6D pose refiner based on a render&compare strategy which can be applied to novel objects.
Second, we introduce a novel approach for coarse pose estimation which leverages a network trained to classify whether the pose error between a synthetic rendering and an observed image of the same object can be corrected by the refiner.
arXiv Detail & Related papers (2022-12-13T19:30:03Z) - A Simple Baseline for Multi-Camera 3D Object Detection [94.63944826540491]
3D object detection with surrounding cameras has been a promising direction for autonomous driving.
We present SimMOD, a Simple baseline for Multi-camera Object Detection.
We conduct extensive experiments on the 3D object detection benchmark of nuScenes to demonstrate the effectiveness of SimMOD.
arXiv Detail & Related papers (2022-08-22T03:38:01Z) - Pose Estimation and 3D Reconstruction of Vehicles from Stereo-Images
Using a Subcategory-Aware Shape Prior [0.0]
3D reconstruction of objects or a computer vision is a prerequisite for many applications such as mobile robotics autonomous driving.
The goal of this paper is to show how 3D object reconstruction can profit from prior shape observations.
arXiv Detail & Related papers (2021-07-22T19:47:49Z) - MoreFusion: Multi-object Reasoning for 6D Pose Estimation from
Volumetric Fusion [19.034317851914725]
We present a system which can estimate the accurate poses of multiple known objects in contact and occlusion from real-time, embodied multi-view vision.
Our approach makes 3D object pose proposals from single RGB-D views, accumulates pose estimates and non-parametric occupancy information from multiple views as the camera moves.
We verify the accuracy and robustness of our approach experimentally on 2 object datasets: YCB-Video, and our own challenging Cluttered YCB-Video.
arXiv Detail & Related papers (2020-04-09T02:29:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.