Related papers: MetaDecorator: Generating Immersive Virtual Tours through Multimodality

Related papers

Video Perception Models for 3D Scene Synthesis [109.5543506037003]
VIPScene is a novel framework that exploits the encoded commonsense knowledge of the 3D physical world in video generation models.<n>VIPScene seamlessly integrates video generation, feedforward 3D reconstruction, and open-vocabulary perception models to semantically and geometrically analyze each object in a scene.
arXiv Detail & Related papers (2025-06-25T16:40:17Z)
Eye-See-You: Reverse Pass-Through VR and Head Avatars [6.514203459446675]
RevAvatar is an innovative framework that leverages AI methodologies to enable reverse pass-through technology.<n>RevAvatar integrates state-of-the-art generative models and multimodal AI techniques to reconstruct high-fidelity 2D facial images.<n>We present VR-Face, a novel dataset comprising 200,000 samples designed to emulate diverse VR-specific conditions.
arXiv Detail & Related papers (2025-05-24T21:08:19Z)
4D Gaussian Splatting: Modeling Dynamic Scenes with Native 4D Primitives [115.67081491747943]
Dynamic 3D scene representation and novel view synthesis are crucial for enabling AR/VR and metaverse applications.<n>We reformulate the reconstruction of a time-varying 3D scene as approximating its underlying 4D volume.<n>We derive several compact variants that effectively reduce the memory footprint to address its storage bottleneck.
arXiv Detail & Related papers (2024-12-30T05:30:26Z)
DynamicScaler: Seamless and Scalable Video Generation for Panoramic Scenes [46.91656616577897]
DynamicScaler enables spatially scalable and panoramic dynamic scene synthesis.<n>We employ a Global Motion Guidance mechanism to ensure both local detail fidelity and global motion continuity.<n>Our method achieves superior content and motion quality in panoramic scene-level video generation.
arXiv Detail & Related papers (2024-12-15T07:42:26Z)
T-SVG: Text-Driven Stereoscopic Video Generation [87.62286959918566]
This paper introduces the Text-driven Stereoscopic Video Generation (T-SVG) system.<n>It streamlines video generation by using text prompts to create reference videos.<n>These videos are transformed into 3D point cloud sequences, which are rendered from two perspectives with subtle parallax differences.
arXiv Detail & Related papers (2024-12-12T14:48:46Z)
MetaDesigner: Advancing Artistic Typography through AI-Driven, User-Centric, and Multilingual WordArt Synthesis [65.78359025027457]
MetaDesigner revolutionizes artistic typography by leveraging the strengths of Large Language Models (LLMs) to drive a design paradigm centered around user engagement. A comprehensive feedback mechanism harnesses insights from multimodal models and user evaluations to refine and enhance the design process iteratively. Empirical validations highlight MetaDesigner's capability to effectively serve diverse WordArt applications, consistently producing aesthetically appealing and context-sensitive results.
arXiv Detail & Related papers (2024-06-28T11:58:26Z)
VR-GS: A Physical Dynamics-Aware Interactive Gaussian Splatting System in Virtual Reality [39.53150683721031]
Our proposed VR-GS system represents a leap forward in human-centered 3D content interaction. The components of our Virtual Reality system are designed for high efficiency and effectiveness.
arXiv Detail & Related papers (2024-01-30T01:28:36Z)
Dream360: Diverse and Immersive Outdoor Virtual Scene Creation via Transformer-Based 360 Image Outpainting [33.95741744421632]
We propose a transformer-based 360 image outpainting framework called Dream360. It can generate diverse, high-fidelity, and high-resolution panoramas from user-selected viewports. Our Dream360 achieves significantly lower Frechet Inception Distance (FID) scores and better visual fidelity than existing methods.
arXiv Detail & Related papers (2024-01-19T09:01:20Z)
GPAvatar: Generalizable and Precise Head Avatar from Image(s) [71.555405205039]
GPAvatar is a framework that reconstructs 3D head avatars from one or several images in a single forward pass. The proposed method achieves faithful identity reconstruction, precise expression control, and multi-view consistency.
arXiv Detail & Related papers (2024-01-18T18:56:34Z)
From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations [107.88375243135579]
Given speech audio, we output multiple possibilities of gestural motion for an individual, including face, body, and hands. We visualize the generated motion using highly photorealistic avatars that can express crucial nuances in gestures. Experiments show our model generates appropriate and diverse gestures, outperforming both diffusion- and VQ-only methods.
arXiv Detail & Related papers (2024-01-03T18:55:16Z)
Let's Give a Voice to Conversational Agents in Virtual Reality [2.7470819871568506]
We present an open-source architecture with the goal of simplifying the development of conversational agents in virtual environments. We present two conversational prototypes operating in the digital health domain developed in Unity for both non-immersive displays and VR headsets.
arXiv Detail & Related papers (2023-08-04T18:51:38Z)
DynIBaR: Neural Dynamic Image-Based Rendering [79.44655794967741]
We address the problem of synthesizing novel views from a monocular video depicting a complex dynamic scene. We adopt a volumetric image-based rendering framework that synthesizes new viewpoints by aggregating features from nearby views. We demonstrate significant improvements over state-of-the-art methods on dynamic scene datasets.
arXiv Detail & Related papers (2022-11-20T20:57:02Z)
HORIZON: High-Resolution Semantically Controlled Panorama Synthesis [105.55531244750019]
Panorama synthesis endeavors to craft captivating 360-degree visual landscapes, immersing users in the heart of virtual worlds. Recent breakthroughs in visual synthesis have unlocked the potential for semantic control in 2D flat images, but a direct application of these methods to panorama synthesis yields distorted content. We unveil an innovative framework for generating high-resolution panoramas, adeptly addressing the issues of spherical distortion and edge discontinuity through sophisticated spherical modeling.
arXiv Detail & Related papers (2022-10-10T09:43:26Z)
Keep It Real: a Window to Real Reality in Virtual Reality [13.173307471333619]
We propose a new interaction paradigm in the virtual reality (VR) environments, which consists of a virtual mirror or window projected onto a virtual surface. This technique can be applied to various videos, live streaming apps, augmented and virtual reality settings to provide an interactive and immersive user experience.
arXiv Detail & Related papers (2020-04-21T21:33:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.