Variable Radiance Field for Real-World Category-Specific Reconstruction from Single Image
- URL: http://arxiv.org/abs/2306.05145v2
- Date: Mon, 17 Feb 2025 07:54:08 GMT
- Title: Variable Radiance Field for Real-World Category-Specific Reconstruction from Single Image
- Authors: Kun Wang, Zhiqiang Yan, Zhenyu Zhang, Xiang Li, Jun Li, Jian Yang,
- Abstract summary: Reconstructing category-specific objects using Neural Radiance Field (NeRF) from a single image is a promising yet challenging task.
We propose Variable Radiance Field (VRF), a novel framework capable of efficiently reconstructing category-specific objects.
VRF achieves state-of-the-art performance in both reconstruction quality and computational efficiency.
- Score: 25.44715538841181
- License:
- Abstract: Reconstructing category-specific objects using Neural Radiance Field (NeRF) from a single image is a promising yet challenging task. Existing approaches predominantly rely on projection-based feature retrieval to associate 3D points in the radiance field with local image features from the reference image. However, this process is computationally expensive, dependent on known camera intrinsics, and susceptible to occlusions. To address these limitations, we propose Variable Radiance Field (VRF), a novel framework capable of efficiently reconstructing category-specific objects without requiring known camera intrinsics and demonstrating robustness against occlusions. First, we replace the local feature retrieval with global latent representations, generated through a single feed-forward pass, which improves efficiency and eliminates reliance on camera intrinsics. Second, to tackle coordinate inconsistencies inherent in real-world dataset, we define a canonical space by introducing a learnable, category-specific shape template and explicitly aligning each training object to this template using a learnable 3D transformation. This approach also reduces the complexity of geometry prediction to modeling deformations from the template to individual instances. Finally, we employ a hyper-network-based method for efficient NeRF creation and enhance the reconstruction performance through a contrastive learning-based pretraining strategy. Evaluations on the CO3D dataset demonstrate that VRF achieves state-of-the-art performance in both reconstruction quality and computational efficiency.
Related papers
- 3D Equivariant Pose Regression via Direct Wigner-D Harmonics Prediction [50.07071392673984]
Existing methods learn 3D rotations parametrized in the spatial domain using angles or quaternions.
We propose a frequency-domain approach that directly predicts Wigner-D coefficients for 3D rotation regression.
Our method achieves state-of-the-art results on benchmarks such as ModelNet10-SO(3) and PASCAL3D+.
arXiv Detail & Related papers (2024-11-01T12:50:38Z) - Large Spatial Model: End-to-end Unposed Images to Semantic 3D [79.94479633598102]
Large Spatial Model (LSM) processes unposed RGB images directly into semantic radiance fields.
LSM simultaneously estimates geometry, appearance, and semantics in a single feed-forward operation.
It can generate versatile label maps by interacting with language at novel viewpoints.
arXiv Detail & Related papers (2024-10-24T17:54:42Z) - Source-Free and Image-Only Unsupervised Domain Adaptation for Category
Level Object Pose Estimation [18.011044932979143]
3DUDA is a method capable of adapting to a nuisance-ridden target domain without 3D or depth data.
We represent object categories as simple cuboid meshes, and harness a generative model of neural feature activations.
We show that our method simulates fine-tuning on a global pseudo-labeled dataset under mild assumptions.
arXiv Detail & Related papers (2024-01-19T17:48:05Z) - 3DFusion, A real-time 3D object reconstruction pipeline based on
streamed instance segmented data [0.552480439325792]
This paper presents a real-time segmentation and reconstruction system that utilizes RGB-D images.
The system performs pixel-level segmentation on RGB-D data, effectively separating foreground objects from the background.
The real-time 3D modelling can be applied across various domains, including augmented/virtual reality, interior design, urban planning, road assistance, security systems, and more.
arXiv Detail & Related papers (2023-11-11T20:11:58Z) - Distance Weighted Trans Network for Image Completion [52.318730994423106]
We propose a new architecture that relies on Distance-based Weighted Transformer (DWT) to better understand the relationships between an image's components.
CNNs are used to augment the local texture information of coarse priors.
DWT blocks are used to recover certain coarse textures and coherent visual structures.
arXiv Detail & Related papers (2023-10-11T12:46:11Z) - Shape, Pose, and Appearance from a Single Image via Bootstrapped
Radiance Field Inversion [54.151979979158085]
We introduce a principled end-to-end reconstruction framework for natural images, where accurate ground-truth poses are not available.
We leverage an unconditional 3D-aware generator, to which we apply a hybrid inversion scheme where a model produces a first guess of the solution.
Our framework can de-render an image in as few as 10 steps, enabling its use in practical scenarios.
arXiv Detail & Related papers (2022-11-21T17:42:42Z) - Unsupervised Multi-View Object Segmentation Using Radiance Field
Propagation [55.9577535403381]
We present a novel approach to segmenting objects in 3D during reconstruction given only unlabeled multi-view images of a scene.
The core of our method is a novel propagation strategy for individual objects' radiance fields with a bidirectional photometric loss.
To the best of our knowledge, RFP is the first unsupervised approach for tackling 3D scene object segmentation for neural radiance field (NeRF)
arXiv Detail & Related papers (2022-10-02T11:14:23Z) - AE-NeRF: Auto-Encoding Neural Radiance Fields for 3D-Aware Object
Manipulation [24.65896451569795]
We propose a novel framework for 3D-aware object manipulation, called Auto-aware Neural Radiance Fields (AE-NeRF)
Our model is formulated in an auto-encoder architecture, extracts disentangled 3D attributes such as 3D shape, appearance, and camera pose from an image.
A high-quality image is rendered from the attributes through disentangled generative Neural Radiance Fields (NeRF)
arXiv Detail & Related papers (2022-04-28T11:50:18Z) - RNNPose: Recurrent 6-DoF Object Pose Refinement with Robust
Correspondence Field Estimation and Pose Optimization [46.144194562841435]
We propose a framework based on a recurrent neural network (RNN) for object pose refinement.
The problem is formulated as a non-linear least squares problem based on the estimated correspondence field.
The correspondence field estimation and pose refinement are conducted alternatively in each iteration to recover accurate object poses.
arXiv Detail & Related papers (2022-03-24T06:24:55Z) - Category Level Object Pose Estimation via Neural Analysis-by-Synthesis [64.14028598360741]
In this paper we combine a gradient-based fitting procedure with a parametric neural image synthesis module.
The image synthesis network is designed to efficiently span the pose configuration space.
We experimentally show that the method can recover orientation of objects with high accuracy from 2D images alone.
arXiv Detail & Related papers (2020-08-18T20:30:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.