WonderFree: Enhancing Novel View Quality and Cross-View Consistency for 3D Scene Exploration
- URL: http://arxiv.org/abs/2506.20590v1
- Date: Wed, 25 Jun 2025 16:28:40 GMT
- Title: WonderFree: Enhancing Novel View Quality and Cross-View Consistency for 3D Scene Exploration
- Authors: Chaojun Ni, Jie Li, Haoyun Li, Hengyu Liu, Xiaofeng Wang, Zheng Zhu, Guosheng Zhao, Boyuan Wang, Chenxin Li, Guan Huang, Wenjun Mei,
- Abstract summary: Key challenge in current 3D generation methods is the limited explorability.<n>We propose WonderFree, the first model that enables users to interactively generate 3D worlds with the freedom to explore from arbitrary angles and directions.
- Score: 28.97217489759405
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Interactive 3D scene generation from a single image has gained significant attention due to its potential to create immersive virtual worlds. However, a key challenge in current 3D generation methods is the limited explorability, which cannot render high-quality images during larger maneuvers beyond the original viewpoint, particularly when attempting to move forward into unseen areas. To address this challenge, we propose WonderFree, the first model that enables users to interactively generate 3D worlds with the freedom to explore from arbitrary angles and directions. Specifically, we decouple this challenge into two key subproblems: novel view quality, which addresses visual artifacts and floating issues in novel views, and cross-view consistency, which ensures spatial consistency across different viewpoints. To enhance rendering quality in novel views, we introduce WorldRestorer, a data-driven video restoration model designed to eliminate floaters and artifacts. In addition, a data collection pipeline is presented to automatically gather training data for WorldRestorer, ensuring it can handle scenes with varying styles needed for 3D scene generation. Furthermore, to improve cross-view consistency, we propose ConsistView, a multi-view joint restoration mechanism that simultaneously restores multiple perspectives while maintaining spatiotemporal coherence. Experimental results demonstrate that WonderFree not only enhances rendering quality across diverse viewpoints but also significantly improves global coherence and consistency. These improvements are confirmed by CLIP-based metrics and a user study showing a 77.20% preference for WonderFree over WonderWorld enabling a seamless and immersive 3D exploration experience. The code, model, and data will be publicly available.
Related papers
- HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels [30.986527559921335]
HunyuanWorld 1.0 is a novel framework that combines the best of both worlds for generating immersive, explorable, and interactive 3D scenes from text and image conditions.<n>Our approach features three key advantages: 1) 360deg immersive experiences via panoramic world proxies; 2) mesh export capabilities for seamless compatibility with existing computer graphics pipelines; 3) disentangled object representations for augmented interactivity.
arXiv Detail & Related papers (2025-07-29T13:43:35Z) - Voyager: Long-Range and World-Consistent Video Diffusion for Explorable 3D Scene Generation [66.95956271144982]
We present Voyager, a novel video diffusion framework that generates world-consistent 3D point-cloud sequences from a single image.<n>Unlike existing approaches, Voyager achieves end-to-end scene generation and reconstruction with inherent consistency across frames.
arXiv Detail & Related papers (2025-06-04T17:59:04Z) - WorldExplorer: Towards Generating Fully Navigable 3D Scenes [49.21733308718443]
WorldExplorer builds fully navigable 3D scenes with consistent visual quality across a wide range of viewpoints.<n>We generate multiple videos along short, pre-defined trajectories, that explore the scene in depth.<n>Our novel scene memory conditions each video on the most relevant prior views, while a collision-detection mechanism prevents degenerate results.
arXiv Detail & Related papers (2025-06-02T15:41:31Z) - FlexWorld: Progressively Expanding 3D Scenes for Flexiable-View Synthesis [61.78754615822363]
We introduce FlexWorld, a framework for generating flexible-view 3D scenes from single images.<n>Our V2V model can generate novel views under large camera pose variations.<n>Building upon it, FlexWorld progressively generates new 3D content and integrates it into the global scene.
arXiv Detail & Related papers (2025-03-17T15:18:38Z) - WonderVerse: Extendable 3D Scene Generation with Video Generative Models [28.002645364066005]
We introduce WonderVerse, a framework for generating extendable 3D scenes.<n>WonderVerse leverages the powerful world-level priors embedded within video generative foundation models.<n>It is compatible with various 3D reconstruction methods, allowing both efficient and high-quality generation.
arXiv Detail & Related papers (2025-03-12T08:44:51Z) - From an Image to a Scene: Learning to Imagine the World from a Million 360 Videos [71.22810401256234]
Three-dimensional (3D) understanding of objects and scenes play a key role in humans' ability to interact with the world.<n>Large scale synthetic and object-centric 3D datasets have shown to be effective in training models that have 3D understanding of objects.<n>We introduce 360-1M, a 360 video dataset, and a process for efficiently finding corresponding frames from diverse viewpoints at scale.
arXiv Detail & Related papers (2024-12-10T18:59:44Z) - ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View Synthesis [63.169364481672915]
We propose textbfViewCrafter, a novel method for synthesizing high-fidelity novel views of generic scenes from single or sparse images.
Our method takes advantage of the powerful generation capabilities of video diffusion model and the coarse 3D clues offered by point-based representation to generate high-quality video frames.
arXiv Detail & Related papers (2024-09-03T16:53:19Z) - Guess The Unseen: Dynamic 3D Scene Reconstruction from Partial 2D Glimpses [9.529416246409355]
We present a method to reconstruct the world and multiple dynamic humans in 3D from a monocular video input.
As a key idea, we represent both the world and multiple humans via the recently emerging 3D Gaussian Splatting (3D-GS) representation.
arXiv Detail & Related papers (2024-04-22T17:59:50Z) - Sat2Scene: 3D Urban Scene Generation from Satellite Images with Diffusion [77.34078223594686]
We propose a novel architecture for direct 3D scene generation by introducing diffusion models into 3D sparse representations and combining them with neural rendering techniques.
Specifically, our approach generates texture colors at the point level for a given geometry using a 3D diffusion model first, which is then transformed into a scene representation in a feed-forward manner.
Experiments in two city-scale datasets show that our model demonstrates proficiency in generating photo-realistic street-view image sequences and cross-view urban scenes from satellite imagery.
arXiv Detail & Related papers (2024-01-19T16:15:37Z) - High-fidelity 3D GAN Inversion by Pseudo-multi-view Optimization [51.878078860524795]
We present a high-fidelity 3D generative adversarial network (GAN) inversion framework that can synthesize photo-realistic novel views.
Our approach enables high-fidelity 3D rendering from a single image, which is promising for various applications of AI-generated 3D content.
arXiv Detail & Related papers (2022-11-28T18:59:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.