Related papers: Zero-P-to-3: Zero-Shot Partial-View Images to 3D Object

Zero-P-to-3: Zero-Shot Partial-View Images to 3D Object

URL: http://arxiv.org/abs/2505.23054v1
Date: Thu, 29 May 2025 03:51:37 GMT
Title: Zero-P-to-3: Zero-Shot Partial-View Images to 3D Object
Authors: Yuxuan Lin, Ruihang Chu, Zhenyu Chen, Xiao Tang, Lei Ke, Haoling Li, Yingji Zhong, Zhihao Li, Shiyong Liu, Xiaofei Wu, Jianzhuang Liu, Yujiu Yang,
Abstract summary: We propose a novel training-free approach that integrates local dense observations and multi-source priors for reconstruction.<n>Our method introduces a fusion-based strategy to effectively align these priors in DDIM sampling, thereby generating multi-view consistent images to supervise invisible views.
Score: 55.93553895520324
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Generative 3D reconstruction shows strong potential in incomplete observations. While sparse-view and single-image reconstruction are well-researched, partial observation remains underexplored. In this context, dense views are accessible only from a specific angular range, with other perspectives remaining inaccessible. This task presents two main challenges: (i) limited View Range: observations confined to a narrow angular scope prevent effective traditional interpolation techniques that require evenly distributed perspectives. (ii) inconsistent Generation: views created for invisible regions often lack coherence with both visible regions and each other, compromising reconstruction consistency. To address these challenges, we propose \method, a novel training-free approach that integrates the local dense observations and multi-source priors for reconstruction. Our method introduces a fusion-based strategy to effectively align these priors in DDIM sampling, thereby generating multi-view consistent images to supervise invisible views. We further design an iterative refinement strategy, which uses the geometric structures of the object to enhance reconstruction quality. Extensive experiments on multiple datasets show the superiority of our method over SOTAs, especially in invisible regions.

Related papers

VEIGAR: View-consistent Explicit Inpainting and Geometry Alignment for 3D object Removal [2.8954284913103367]
Novel View Synthesis (NVS) and 3D generation have significantly improved editing tasks.<n>To maintain cross-view consistency throughout the generative process, methods typically address this challenge using a dual-strategy framework.<n>We present VEIGAR, a computationally efficient framework that outperforms existing methods without relying on an initial reconstruction phase.
arXiv Detail & Related papers (2025-06-13T11:31:44Z)
Intern-GS: Vision Model Guided Sparse-View 3D Gaussian Splatting [95.61137026932062]
Intern-GS is a novel approach to enhance the process of sparse-view Gaussian splatting.<n>We show that Intern-GS achieves state-of-the-art rendering quality across diverse datasets.
arXiv Detail & Related papers (2025-05-27T05:17:49Z)
Mono3R: Exploiting Monocular Cues for Geometric 3D Reconstruction [11.220655907305515]
We introduce a monocular-guided refinement module that integrates monocular geometric priors into multi-view reconstruction frameworks.<n>Our method achieves substantial improvements in both mutli-view camera pose estimation and point cloud accuracy.
arXiv Detail & Related papers (2025-04-18T02:33:12Z)
Synthesizing Consistent Novel Views via 3D Epipolar Attention without Re-Training [102.82553402539139]
Large diffusion models demonstrate remarkable zero-shot capabilities in novel view synthesis from a single image.<n>These models often face challenges in maintaining consistency across novel and reference views.<n>We propose to use epipolar geometry to locate and retrieve overlapping information from the input view.<n>This information is then incorporated into the generation of target views, eliminating the need for training or fine-tuning.
arXiv Detail & Related papers (2025-02-25T14:04:22Z)
CrossView-GS: Cross-view Gaussian Splatting For Large-scale Scene Reconstruction [5.528874948395173]
We propose a novel cross-view Gaussian Splatting method for large-scale scene reconstruction based on multi-branch construction and fusion.<n>Our method achieves superior performance in novel view synthesis compared to state-of-the-art methods.
arXiv Detail & Related papers (2025-01-03T08:24:59Z)
Unsupervised Multi-view UAV Image Geo-localization via Iterative Rendering [31.716967688739036]
Unmanned Aerial Vehicle (UAV) Cross-View Geo-Localization (CVGL) presents significant challenges. Existing methods rely on the supervision of labeled datasets to extract viewpoint-invariant features for cross-view retrieval. We propose an unsupervised solution that lifts the scene representation to 3d space from UAV observations for satellite image generation.
arXiv Detail & Related papers (2024-11-22T09:22:39Z)
StreetSurf: Extending Multi-view Implicit Surface Reconstruction to Street Views [6.35910814268525]
We present a novel multi-view implicit surface reconstruction technique, termed StreetSurf. It is readily applicable to street view images in widely-used autonomous driving datasets, without necessarily requiring LiDAR data. We achieve state of the art reconstruction quality in both geometry and appearance within only one to two hours of training time.
arXiv Detail & Related papers (2023-06-08T07:19:27Z)
Geometric-aware Pretraining for Vision-centric 3D Object Detection [77.7979088689944]
We propose a novel geometric-aware pretraining framework called GAPretrain. GAPretrain serves as a plug-and-play solution that can be flexibly applied to multiple state-of-the-art detectors. We achieve 46.2 mAP and 55.5 NDS on the nuScenes val set using the BEVFormer method, with a gain of 2.7 and 2.1 points, respectively.
arXiv Detail & Related papers (2023-04-06T14:33:05Z)
Unsupervised Multi-View Object Segmentation Using Radiance Field Propagation [55.9577535403381]
We present a novel approach to segmenting objects in 3D during reconstruction given only unlabeled multi-view images of a scene. The core of our method is a novel propagation strategy for individual objects' radiance fields with a bidirectional photometric loss. To the best of our knowledge, RFP is the first unsupervised approach for tackling 3D scene object segmentation for neural radiance field (NeRF)
arXiv Detail & Related papers (2022-10-02T11:14:23Z)
Single-view 3D Mesh Reconstruction for Seen and Unseen Categories [69.29406107513621]
Single-view 3D Mesh Reconstruction is a fundamental computer vision task that aims at recovering 3D shapes from single-view RGB images. This paper tackles Single-view 3D Mesh Reconstruction, to study the model generalization on unseen categories. We propose an end-to-end two-stage network, GenMesh, to break the category boundaries in reconstruction.
arXiv Detail & Related papers (2022-08-04T14:13:35Z)
2D GANs Meet Unsupervised Single-view 3D Reconstruction [21.93671761497348]
controllable image generation based on pre-trained GANs can benefit a wide range of computer vision tasks. We propose a novel image-conditioned neural implicit field, which can leverage 2D supervisions from GAN-generated multi-view images. The effectiveness of our approach is demonstrated through superior single-view 3D reconstruction results of generic objects.
arXiv Detail & Related papers (2022-07-20T20:24:07Z)
PC-RGNN: Point Cloud Completion and Graph Neural Network for 3D Object Detection [57.49788100647103]
LiDAR-based 3D object detection is an important task for autonomous driving. Current approaches suffer from sparse and partial point clouds of distant and occluded objects. In this paper, we propose a novel two-stage approach, namely PC-RGNN, dealing with such challenges by two specific solutions.
arXiv Detail & Related papers (2020-12-18T18:06:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.