Related papers: Unrolling Virtual Worlds for Immersive Experiences

Related papers

HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels [30.986527559921335]
HunyuanWorld 1.0 is a novel framework that combines the best of both worlds for generating immersive, explorable, and interactive 3D scenes from text and image conditions.<n>Our approach features three key advantages: 1) 360deg immersive experiences via panoramic world proxies; 2) mesh export capabilities for seamless compatibility with existing computer graphics pipelines; 3) disentangled object representations for augmented interactivity.
arXiv Detail & Related papers (2025-07-29T13:43:35Z)
WorldExplorer: Towards Generating Fully Navigable 3D Scenes [49.21733308718443]
WorldExplorer builds fully navigable 3D scenes with consistent visual quality across a wide range of viewpoints.<n>We generate multiple videos along short, pre-defined trajectories, that explore the scene in depth.<n>Our novel scene memory conditions each video on the most relevant prior views, while a collision-detection mechanism prevents degenerate results.
arXiv Detail & Related papers (2025-06-02T15:41:31Z)
GenSpace: Benchmarking Spatially-Aware Image Generation [76.98817635685278]
Humans intuitively compose and arrange scenes in the 3D space for photography.<n>Can advanced AI image generators plan scenes with similar 3D spatial awareness when creating images from text or image prompts?<n>We present GenSpace, a novel benchmark and evaluation pipeline to assess the spatial awareness of current image generation models.
arXiv Detail & Related papers (2025-05-30T17:59:26Z)
In-Place Panoptic Radiance Field Segmentation with Perceptual Prior for 3D Scene Understanding [1.8130068086063336]
This paper introduces a novel perceptual-prior-guided 3D scene representation and panoptic understanding method. It reformulates panoptic understanding within neural radiance fields as a linear assignment problem involving 2D semantics and instance recognition. Experiments and ablation studies under challenging conditions, including synthetic and real-world scenes, demonstrate the proposed method's effectiveness.
arXiv Detail & Related papers (2024-10-06T15:49:58Z)
DreamScene360: Unconstrained Text-to-3D Scene Generation with Panoramic Gaussian Splatting [56.101576795566324]
We present a text-to-3D 360$circ$ scene generation pipeline. Our approach utilizes the generative power of a 2D diffusion model and prompt self-refinement. Our method offers a globally consistent 3D scene within a 360$circ$ perspective.
arXiv Detail & Related papers (2024-04-10T10:46:59Z)
Recent Trends in 3D Reconstruction of General Non-Rigid Scenes [104.07781871008186]
Reconstructing models of the real world, including 3D geometry, appearance, and motion of real scenes, is essential for computer graphics and computer vision. It enables the synthesizing of photorealistic novel views, useful for the movie industry and AR/VR applications. This state-of-the-art report (STAR) offers the reader a comprehensive summary of state-of-the-art techniques with monocular and multi-view inputs.
arXiv Detail & Related papers (2024-03-22T09:46:11Z)
OmniSCV: An Omnidirectional Synthetic Image Generator for Computer Vision [5.2178708158547025]
We present a tool for generating datasets of omnidirectional images with semantic and depth information. These images are synthesized from a set of captures that are acquired in a realistic virtual environment for Unreal Engine 4. We include in our tool photorealistic non-central-projection systems as non-central panoramas and non-central catadioptric systems.
arXiv Detail & Related papers (2024-01-30T14:40:19Z)
Sat2Scene: 3D Urban Scene Generation from Satellite Images with Diffusion [77.34078223594686]
We propose a novel architecture for direct 3D scene generation by introducing diffusion models into 3D sparse representations and combining them with neural rendering techniques. Specifically, our approach generates texture colors at the point level for a given geometry using a 3D diffusion model first, which is then transformed into a scene representation in a feed-forward manner. Experiments in two city-scale datasets show that our model demonstrates proficiency in generating photo-realistic street-view image sequences and cross-view urban scenes from satellite imagery.
arXiv Detail & Related papers (2024-01-19T16:15:37Z)
PanoContext-Former: Panoramic Total Scene Understanding with a Transformer [37.51637352106841]
Panoramic image enables deeper understanding and more holistic perception of $360circ$ surrounding environment. In this paper, we propose a novel method using depth prior for holistic indoor scene understanding. In addition, we introduce a real-world dataset for scene understanding, including photo-realistic panoramas, high-fidelity depth images, accurately annotated room layouts, and oriented object bounding boxes and shapes.
arXiv Detail & Related papers (2023-05-21T16:20:57Z)
Object Scene Representation Transformer [56.40544849442227]
We introduce Object Scene Representation Transformer (OSRT), a 3D-centric model in which individual object representations naturally emerge through novel view synthesis. OSRT scales to significantly more complex scenes with larger diversity of objects and backgrounds than existing methods. It is multiple orders of magnitude faster at compositional rendering thanks to its light field parametrization and the novel Slot Mixer decoder.
arXiv Detail & Related papers (2022-06-14T15:40:47Z)
Learning Indoor Inverse Rendering with 3D Spatially-Varying Lighting [149.1673041605155]
We address the problem of jointly estimating albedo, normals, depth and 3D spatially-varying lighting from a single image. Most existing methods formulate the task as image-to-image translation, ignoring the 3D properties of the scene. We propose a unified, learning-based inverse framework that formulates 3D spatially-varying lighting.
arXiv Detail & Related papers (2021-09-13T15:29:03Z)
GaussiGAN: Controllable Image Synthesis with 3D Gaussians from Unposed Silhouettes [48.642181362172906]
We present an algorithm that learns a coarse 3D representation of objects from unposed multi-view 2D mask supervision. In contrast to existing voxel-based methods for unposed object reconstruction, our approach learns to represent the generated shape and pose. We show results on synthetic datasets with realistic lighting, and demonstrate object insertion with interactive posing.
arXiv Detail & Related papers (2021-06-24T17:47:58Z)
SAILenv: Learning in Virtual Visual Environments Made Simple [16.979621213790015]
We present a novel platform that allows researchers to experiment visual recognition in virtual 3D scenes. A few lines of code are needed to interface every algorithm with the virtual world, and non-3D-graphics experts can easily customize the 3D environment itself. Our framework yields pixel-level semantic and instance labeling, depth, and, to the best of our knowledge, it is the only one that provides motion-related information directly inherited from the 3D engine.
arXiv Detail & Related papers (2020-07-16T09:50:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.