SUP-NeRF: A Streamlined Unification of Pose Estimation and NeRF for Monocular 3D Object Reconstruction
- URL: http://arxiv.org/abs/2403.15705v2
- Date: Sun, 14 Jul 2024 18:50:38 GMT
- Title: SUP-NeRF: A Streamlined Unification of Pose Estimation and NeRF for Monocular 3D Object Reconstruction
- Authors: Yuliang Guo, Abhinav Kumar, Cheng Zhao, Ruoyu Wang, Xinyu Huang, Liu Ren,
- Abstract summary: We present SUP-NeRF, a Streamlined Unification of object Pose estimation and NeRF-based object reconstruction.
SUP-NeRF decouples the object's dimension estimation and pose refinement to resolve the scale-depth ambiguity.
SUP-NeRF results in both reconstruction and pose estimation tasks on the nuScenes dataset.
- Score: 15.166003559787915
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Monocular 3D reconstruction for categorical objects heavily relies on accurately perceiving each object's pose. While gradient-based optimization in a NeRF framework updates the initial pose, this paper highlights that scale-depth ambiguity in monocular object reconstruction causes failures when the initial pose deviates moderately from the true pose. Consequently, existing methods often depend on a third-party 3D object to provide an initial object pose, leading to increased complexity and generalization issues. To address these challenges, we present SUP-NeRF, a Streamlined Unification of object Pose estimation and NeRF-based object reconstruction. SUP-NeRF decouples the object's dimension estimation and pose refinement to resolve the scale-depth ambiguity, and introduces a camera-invariant projected-box representation that generalizes cross different domains. While using a dedicated pose estimator that smoothly integrates into an object-centric NeRF, SUP-NeRF is free from external 3D detectors. SUP-NeRF achieves state-of-the-art results in both reconstruction and pose estimation tasks on the nuScenes dataset. Furthermore, SUP-NeRF exhibits exceptional cross-dataset generalization on the KITTI and Waymo datasets, surpassing prior methods with up to 50\% reduction in rotation and translation error.
Related papers
- Sparse-View 3D Reconstruction: Recent Advances and Open Challenges [0.8583178253811411]
Sparse-view 3D reconstruction is essential for applications in which dense image acquisition is impractical.<n>This survey reviews the latest advances in neural implicit models and explicit point-cloud-based approaches.<n>We analyze how geometric regularization, explicit shape modeling, and generative inference are used to mitigate artifacts.
arXiv Detail & Related papers (2025-07-22T09:57:28Z) - RA-NeRF: Robust Neural Radiance Field Reconstruction with Accurate Camera Pose Estimation under Complex Trajectories [21.97835451388508]
RA-NeRF is capable of predicting highly accurate camera poses even with complex camera trajectories.<n> RA-NeRF achieves state-of-the-art results in both camera pose estimation and visual quality.
arXiv Detail & Related papers (2025-06-18T08:21:19Z) - GCRayDiffusion: Pose-Free Surface Reconstruction via Geometric Consistent Ray Diffusion [30.773599974914415]
Previous approaches have achieved impressive pose-free surface reconstruction results in dense-view settings.
We propose a new technique for pose-free surface reconstruction, which regularizes the learning by explicit points sampled from ray-based diffusion of camera pose estimation.
Our GCRayDiffusion achieves more accurate camera pose estimation than previous approaches, with geometrically more consistent surface reconstruction results.
arXiv Detail & Related papers (2025-03-28T11:45:09Z) - Decompositional Neural Scene Reconstruction with Generative Diffusion Prior [64.71091831762214]
Decompositional reconstruction of 3D scenes, with complete shapes and detailed texture, is intriguing for downstream applications.
Recent approaches incorporate semantic or geometric regularization to address this issue, but they suffer significant degradation in underconstrained areas.
We propose DP-Recon, which employs diffusion priors in the form of Score Distillation Sampling (SDS) to optimize the neural representation of each individual object under novel views.
arXiv Detail & Related papers (2025-03-19T02:11:31Z) - FreeSplatter: Pose-free Gaussian Splatting for Sparse-view 3D Reconstruction [69.63414788486578]
FreeSplatter is a scalable feed-forward framework that generates high-quality 3D Gaussians from uncalibrated sparse-view images.<n>Our approach employs a streamlined transformer architecture where self-attention blocks facilitate information exchange.<n>We develop two specialized variants--for object-centric and scene-level reconstruction--trained on comprehensive datasets.
arXiv Detail & Related papers (2024-12-12T18:52:53Z) - Sparse-view Pose Estimation and Reconstruction via Analysis by Generative Synthesis [25.898616784744377]
Given a sparse set of observed views, the observations may not provide sufficient direct evidence to obtain complete and accurate 3D.
We propose SparseAGS, a method that adapts this analysis-by-synthesis approach by: a) including novel-view-synthesis-based generative priors in conjunction with photometric objectives to improve the quality of the inferred 3D, and b) explicitly reasoning about outliers and using a discrete search with a continuous optimization-based strategy to correct them.
arXiv Detail & Related papers (2024-12-04T18:59:24Z) - UNOPose: Unseen Object Pose Estimation with an Unposed RGB-D Reference Image [86.7128543480229]
We present a novel approach and benchmark, termed UNOPose, for unseen one-reference-based object pose estimation.
Building upon a coarse-to-fine paradigm, UNOPose constructs an SE(3)-invariant reference frame to standardize object representation.
We recalibrate the weight of each correspondence based on its predicted likelihood of being within the overlapping region.
arXiv Detail & Related papers (2024-11-25T05:36:00Z) - Towards Degradation-Robust Reconstruction in Generalizable NeRF [58.33351079982745]
Generalizable Radiance Field (GNeRF) across scenes has been proven to be an effective way to avoid per-scene optimization.
There has been limited research on the robustness of GNeRFs to different types of degradation present in the source images.
arXiv Detail & Related papers (2024-11-18T16:13:47Z) - LU-NeRF: Scene and Pose Estimation by Synchronizing Local Unposed NeRFs [56.050550636941836]
A critical obstacle preventing NeRF models from being deployed broadly in the wild is their reliance on accurate camera poses.
We propose a novel approach, LU-NeRF, that jointly estimates camera poses and neural fields with relaxed assumptions on pose configuration.
We show our LU-NeRF pipeline outperforms prior attempts at unposed NeRF without making restrictive assumptions on the pose prior.
arXiv Detail & Related papers (2023-06-08T17:56:22Z) - In-N-Out: Faithful 3D GAN Inversion with Volumetric Decomposition for Face Editing [28.790900756506833]
3D-aware GANs offer new capabilities for view synthesis while preserving the editing functionalities of their 2D counterparts.
GAN inversion is a crucial step that seeks the latent code to reconstruct input images or videos, subsequently enabling diverse editing tasks through manipulation of this latent code.
We address this issue by explicitly modeling OOD objects from the input in 3D-aware GANs.
arXiv Detail & Related papers (2023-02-09T18:59:56Z) - Few-View Object Reconstruction with Unknown Categories and Camera Poses [80.0820650171476]
This work explores reconstructing general real-world objects from a few images without known camera poses or object categories.
The crux of our work is solving two fundamental 3D vision problems -- shape reconstruction and pose estimation.
Our method FORGE predicts 3D features from each view and leverages them in conjunction with the input images to establish cross-view correspondence.
arXiv Detail & Related papers (2022-12-08T18:59:02Z) - RBP-Pose: Residual Bounding Box Projection for Category-Level Pose
Estimation [103.74918834553247]
Category-level object pose estimation aims to predict the 6D pose as well as the 3D metric size of arbitrary objects from a known set of categories.
Recent methods harness shape prior adaptation to map the observed point cloud into the canonical space and apply Umeyama algorithm to recover the pose and size.
We propose a novel geometry-guided Residual Object Bounding Box Projection network RBP-Pose that jointly predicts object pose and residual vectors.
arXiv Detail & Related papers (2022-07-30T14:45:20Z) - RNNPose: Recurrent 6-DoF Object Pose Refinement with Robust
Correspondence Field Estimation and Pose Optimization [46.144194562841435]
We propose a framework based on a recurrent neural network (RNN) for object pose refinement.
The problem is formulated as a non-linear least squares problem based on the estimated correspondence field.
The correspondence field estimation and pose refinement are conducted alternatively in each iteration to recover accurate object poses.
arXiv Detail & Related papers (2022-03-24T06:24:55Z) - Iterative Optimisation with an Innovation CNN for Pose Refinement [17.752556490937092]
In this work we propose an approach, namely an Innovation CNN, to object pose estimation refinement.
Our approach improves initial pose estimation progressively by applying the Innovation CNN iteratively in a gradient descent framework.
We evaluate our method on the popular LINEMOD and Occlusion LINEMOD datasets and obtain state-of-the-art performance on both datasets.
arXiv Detail & Related papers (2021-01-22T00:12:12Z) - Reconstruct, Rasterize and Backprop: Dense shape and pose estimation
from a single image [14.9851111159799]
This paper presents a new system to obtain dense object reconstructions along with 6-DoF poses from a single image.
We leverage recent advances in differentiable rendering (in particular, robotics) to close the loop with 3D reconstruction in camera frame.
arXiv Detail & Related papers (2020-04-25T20:53:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.