ImmerseGen: Agent-Guided Immersive World Generation with Alpha-Textured Proxies
- URL: http://arxiv.org/abs/2506.14315v2
- Date: Wed, 18 Jun 2025 07:15:43 GMT
- Title: ImmerseGen: Agent-Guided Immersive World Generation with Alpha-Textured Proxies
- Authors: Jinyan Yuan, Bangbang Yang, Keke Wang, Panwang Pan, Lin Ma, Xuehai Zhang, Xiao Liu, Zhaopeng Cui, Yuewen Ma,
- Abstract summary: This paper presents WeImmerseGen, a novel agent-guided framework for compact and world-conditioned VR scenes.<n>We propose it bypasses complex textures with semanticcentric modeling.<n> Experiments demonstrate improved user efficiency and better VR rendering on mobile headsets.
- Score: 25.96895266979283
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Automatic creation of 3D scenes for immersive VR presence has been a significant research focus for decades. However, existing methods often rely on either high-poly mesh modeling with post-hoc simplification or massive 3D Gaussians, resulting in a complex pipeline or limited visual realism. In this paper, we demonstrate that such exhaustive modeling is unnecessary for achieving compelling immersive experience. We introduce ImmerseGen, a novel agent-guided framework for compact and photorealistic world modeling. ImmerseGen represents scenes as hierarchical compositions of lightweight geometric proxies, i.e., simplified terrain and billboard meshes, and generates photorealistic appearance by synthesizing RGBA textures onto these proxies. Specifically, we propose terrain-conditioned texturing for user-centric base world synthesis, and RGBA asset texturing for midground and foreground scenery. This reformulation offers several advantages: (i) it simplifies modeling by enabling agents to guide generative models in producing coherent textures that integrate seamlessly with the scene; (ii) it bypasses complex geometry creation and decimation by directly synthesizing photorealistic textures on proxies, preserving visual quality without degradation; (iii) it enables compact representations suitable for real-time rendering on mobile VR headsets. To automate scene creation from text prompts, we introduce VLM-based modeling agents enhanced with semantic grid-based analysis for improved spatial reasoning and accurate asset placement. ImmerseGen further enriches scenes with dynamic effects and ambient audio to support multisensory immersion. Experiments on scene generation and live VR showcases demonstrate that ImmerseGen achieves superior photorealism, spatial coherence and rendering efficiency compared to prior methods. Project webpage: https://immersegen.github.io.
Related papers
- LiteReality: Graphics-Ready 3D Scene Reconstruction from RGB-D Scans [64.31686158593351]
LiteReality is a novel pipeline that converts RGB-D scans of indoor environments into compact, realistic, and interactive 3D virtual replicas.<n> LiteReality supports key features essential for graphics pipelines -- such as object individuality, articulation, high-quality rendering materials, and physically based interaction.<n>We demonstrate the effectiveness of LiteReality on both real-life scans and public datasets.
arXiv Detail & Related papers (2025-07-03T17:59:55Z) - EnvGS: Modeling View-Dependent Appearance with Environment Gaussian [78.74634059559891]
EnvGS is a novel approach that employs a set of Gaussian primitives as an explicit 3D representation for capturing reflections of environments.<n>To efficiently render these environment Gaussian primitives, we developed a ray-tracing-based reflection that leverages the GPU's RT core for fast rendering.<n>Results from multiple real-world and synthetic datasets demonstrate that our method produces significantly more detailed reflections.
arXiv Detail & Related papers (2024-12-19T18:59:57Z) - Skyeyes: Ground Roaming using Aerial View Images [9.159470619808127]
We introduce Skyeyes, a novel framework that can generate sequences of ground view images using only aerial view inputs.
More specifically, we combine a 3D representation with a view consistent generation model, which ensures coherence between generated images.
The images maintain improved spatial-temporal coherence and realism, enhancing scene comprehension and visualization from aerial perspectives.
arXiv Detail & Related papers (2024-09-25T07:21:43Z) - Real-Time Neural Rasterization for Large Scenes [39.198327570559684]
We propose a new method for realistic real-time novel-view synthesis of large scenes.
Existing neural rendering methods generate realistic results, but primarily work for small scale scenes.
Our work is the first to enable real-time rendering of large real-world scenes.
arXiv Detail & Related papers (2023-11-09T18:59:10Z) - FLARE: Fast Learning of Animatable and Relightable Mesh Avatars [64.48254296523977]
Our goal is to efficiently learn personalized animatable 3D head avatars from videos that are geometrically accurate, realistic, relightable, and compatible with current rendering systems.
We introduce FLARE, a technique that enables the creation of animatable and relightable avatars from a single monocular video.
arXiv Detail & Related papers (2023-10-26T16:13:00Z) - DreamSpace: Dreaming Your Room Space with Text-Driven Panoramic Texture
Propagation [31.353409149640605]
In this paper, we propose a novel framework to generate 3D textures for immersive VR experiences.
To survive, we separate texture cues in confidential regions and learn to network textures in real-world environments.
arXiv Detail & Related papers (2023-10-19T19:29:23Z) - NeuralField-LDM: Scene Generation with Hierarchical Latent Diffusion
Models [85.20004959780132]
We introduce NeuralField-LDM, a generative model capable of synthesizing complex 3D environments.
We show how NeuralField-LDM can be used for a variety of 3D content creation applications, including conditional scene generation, scene inpainting and scene style manipulation.
arXiv Detail & Related papers (2023-04-19T16:13:21Z) - Texture Generation Using Graph Generative Adversarial Network And
Differentiable Rendering [0.6439285904756329]
Novel texture synthesis for existing 3D mesh models is an important step towards photo realistic asset generation for simulators.
Existing methods inherently work in the 2D image space which is the projection of the 3D space from a given camera perspective.
We present a new system called a graph generative adversarial network (GGAN) that can generate textures which can be directly integrated into a given 3D mesh models with tools like Blender and Unreal Engine.
arXiv Detail & Related papers (2022-06-17T04:56:03Z) - Learning Indoor Inverse Rendering with 3D Spatially-Varying Lighting [149.1673041605155]
We address the problem of jointly estimating albedo, normals, depth and 3D spatially-varying lighting from a single image.
Most existing methods formulate the task as image-to-image translation, ignoring the 3D properties of the scene.
We propose a unified, learning-based inverse framework that formulates 3D spatially-varying lighting.
arXiv Detail & Related papers (2021-09-13T15:29:03Z) - SMPLpix: Neural Avatars from 3D Human Models [56.85115800735619]
We bridge the gap between classic rendering and the latest generative networks operating in pixel space.
We train a network that directly converts a sparse set of 3D mesh vertices into photorealistic images.
We show the advantage over conventional differentiables both in terms of the level of photorealism and rendering efficiency.
arXiv Detail & Related papers (2020-08-16T10:22:00Z) - Photorealism in Driving Simulations: Blending Generative Adversarial
Image Synthesis with Rendering [0.0]
We introduce a hybrid generative neural graphics pipeline for improving the visual fidelity of driving simulations.
We form 2D semantic images from 3D scenery consisting of simple object models without textures.
These semantic images are then converted into photorealistic RGB images with a state-of-the-art Generative Adrial Network (GAN) trained on real-world driving scenes.
arXiv Detail & Related papers (2020-07-31T03:25:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.