LSS3D: Learnable Spatial Shifting for Consistent and High-Quality 3D Generation from Single-Image
- URL: http://arxiv.org/abs/2511.12202v1
- Date: Sat, 15 Nov 2025 13:14:25 GMT
- Title: LSS3D: Learnable Spatial Shifting for Consistent and High-Quality 3D Generation from Single-Image
- Authors: Zhuojiang Cai, Yiheng Zhang, Meitong Guo, Mingdao Wang, Yuwang Wang,
- Abstract summary: We propose a high-quality image-to-3D approach, named LSS3D, with learnable spatial shifting to handle the multiview inconsistencies and non-frontal input view.<n>Specifically, we assign learnable spatial shifting parameters to each view, and adjust each view towards a spatially consistent target, guided by the reconstructed mesh.<n>Our method consistently achieves leading results in both geometric and texture evaluation metrics across more flexible input viewpoints.
- Score: 12.683334603689502
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, multi-view diffusion-based 3D generation methods have gained significant attention. However, these methods often suffer from shape and texture misalignment across generated multi-view images, leading to low-quality 3D generation results, such as incomplete geometric details and textural ghosting. Some methods are mainly optimized for the frontal perspective and exhibit poor robustness to oblique perspective inputs. In this paper, to tackle the above challenges, we propose a high-quality image-to-3D approach, named LSS3D, with learnable spatial shifting to explicitly and effectively handle the multiview inconsistencies and non-frontal input view. Specifically, we assign learnable spatial shifting parameters to each view, and adjust each view towards a spatially consistent target, guided by the reconstructed mesh, resulting in high-quality 3D generation with more complete geometric details and clean textures. Besides, we include the input view as an extra constraint for the optimization, further enhancing robustness to non-frontal input angles, especially for elevated viewpoint inputs. We also provide a comprehensive quantitative evaluation pipeline that can contribute to the community in performance comparisons. Extensive experiments demonstrate that our method consistently achieves leading results in both geometric and texture evaluation metrics across more flexible input viewpoints.
Related papers
- Wonder3D++: Cross-domain Diffusion for High-fidelity 3D Generation from a Single Image [68.55613894952177]
We introduce textbfWonder3D++, a novel method for efficiently generating high-fidelity textured meshes from single-view images.<n>We propose a cross-domain diffusion model that generates multi-view normal maps and the corresponding color images.<n> Lastly, we introduce a cascaded 3D mesh extraction algorithm that drives high-quality surfaces from the multi-view 2D representations in only about $3$ minute in a coarse-to-fine manner.
arXiv Detail & Related papers (2025-11-03T17:24:18Z) - Towards Geometric and Textural Consistency 3D Scene Generation via Single Image-guided Model Generation and Layout Optimization [14.673302810271219]
We propose a novel three-stage framework for 3D scene generation with explicit geometric representations and high-quality textural details.<n>Our approach not only outperforms state-of-the-art methods in terms of geometric accuracy and texture fidelity of individual generated 3D models, but also has significant advantages in scene layout synthesis.
arXiv Detail & Related papers (2025-07-20T06:59:42Z) - CDI3D: Cross-guided Dense-view Interpolation for 3D Reconstruction [25.468907201804093]
Large Reconstruction Models (LRMs) have shown great promise in leveraging multi-view images generated by 2D diffusion models to extract 3D content.<n>However, 2D diffusion models often struggle to produce dense images with strong multi-view consistency.<n>We present CDI3D, a feed-forward framework designed for efficient, high-quality image-to-3D generation with view.
arXiv Detail & Related papers (2025-03-11T03:08:43Z) - F3D-Gaus: Feed-forward 3D-aware Generation on ImageNet with Cycle-Aggregative Gaussian Splatting [35.625593119642424]
This paper tackles the problem of generalizable 3D-aware generation from monocular datasets.<n>We propose a novel feed-forward pipeline based on pixel-aligned Gaussian Splatting.<n>We also introduce a self-supervised cycle-aggregative constraint to enforce cross-view consistency in the learned 3D representation.
arXiv Detail & Related papers (2025-01-12T04:44:44Z) - GaussianAnything: Interactive Point Cloud Flow Matching For 3D Object Generation [75.39457097832113]
This paper introduces a novel 3D generation framework, offering scalable, high-quality 3D generation with an interactive Point Cloud-structured Latent space.<n>Our framework employs a Variational Autoencoder with multi-view posed RGB-D(epth)-N(ormal) renderings as input, using a unique latent space design that preserves 3D shape information.<n>The proposed method, GaussianAnything, supports multi-modal conditional 3D generation, allowing for point cloud, caption, and single image inputs.
arXiv Detail & Related papers (2024-11-12T18:59:32Z) - PF3plat: Pose-Free Feed-Forward 3D Gaussian Splatting [54.7468067660037]
PF3plat sets a new state-of-the-art across all benchmarks, supported by comprehensive ablation studies validating our design choices.<n>Our framework capitalizes on fast speed, scalability, and high-quality 3D reconstruction and view synthesis capabilities of 3DGS.
arXiv Detail & Related papers (2024-10-29T15:28:15Z) - Wonder3D: Single Image to 3D using Cross-Domain Diffusion [105.16622018766236]
Wonder3D is a novel method for efficiently generating high-fidelity textured meshes from single-view images.
To holistically improve the quality, consistency, and efficiency of image-to-3D tasks, we propose a cross-domain diffusion model.
arXiv Detail & Related papers (2023-10-23T15:02:23Z) - Guide3D: Create 3D Avatars from Text and Image Guidance [55.71306021041785]
Guide3D is a text-and-image-guided generative model for 3D avatar generation based on diffusion models.
Our framework produces topologically and structurally correct geometry and high-resolution textures.
arXiv Detail & Related papers (2023-08-18T17:55:47Z) - High-fidelity 3D GAN Inversion by Pseudo-multi-view Optimization [51.878078860524795]
We present a high-fidelity 3D generative adversarial network (GAN) inversion framework that can synthesize photo-realistic novel views.
Our approach enables high-fidelity 3D rendering from a single image, which is promising for various applications of AI-generated 3D content.
arXiv Detail & Related papers (2022-11-28T18:59:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.