ViSA: 3D-Aware Video Shading for Real-Time Upper-Body Avatar Creation
- URL: http://arxiv.org/abs/2512.07720v2
- Date: Tue, 09 Dec 2025 06:20:21 GMT
- Title: ViSA: 3D-Aware Video Shading for Real-Time Upper-Body Avatar Creation
- Authors: Fan Yang, Heyuan Li, Peihao Li, Weihao Yuan, Lingteng Qiu, Chaoyue Song, Cheng Chen, Yisheng He, Shifeng Zhang, Xiaoguang Han, Steven Hoi, Guosheng Lin,
- Abstract summary: Current 3D avatar generation methods often suffer from artifacts such as blurry textures and stiff, unnatural motion.<n>We propose a novel approach that combines the strengths of both paradigms.<n>By uniting the geometric stability of 3D reconstruction with the generative capabilities of video models, our method produces high-fidelity digital avatars.
- Score: 62.86900540547787
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Generating high-fidelity upper-body 3D avatars from one-shot input image remains a significant challenge. Current 3D avatar generation methods, which rely on large reconstruction models, are fast and capable of producing stable body structures, but they often suffer from artifacts such as blurry textures and stiff, unnatural motion. In contrast, generative video models show promising performance by synthesizing photorealistic and dynamic results, but they frequently struggle with unstable behavior, including body structural errors and identity drift. To address these limitations, we propose a novel approach that combines the strengths of both paradigms. Our framework employs a 3D reconstruction model to provide robust structural and appearance priors, which in turn guides a real-time autoregressive video diffusion model for rendering. This process enables the model to synthesize high-frequency, photorealistic details and fluid dynamics in real time, effectively reducing texture blur and motion stiffness while preventing the structural inconsistencies common in video generation methods. By uniting the geometric stability of 3D reconstruction with the generative capabilities of video models, our method produces high-fidelity digital avatars with realistic appearance and dynamic, temporally coherent motion. Experiments demonstrate that our approach significantly reduces artifacts and achieves substantial improvements in visual quality over leading methods, providing a robust and efficient solution for real-time applications such as gaming and virtual reality. Project page: https://lhyfst.github.io/visa
Related papers
- Pixel-to-4D: Camera-Controlled Image-to-Video Generation with Dynamic 3D Gaussians [7.051077403685518]
Humans excel at forecasting the future dynamics of a scene given just a single image.<n>Video generation models that can mimic this ability are an essential component for intelligent systems.<n>Recent approaches have improved temporal coherence and 3D consistency in single-image-conditioned video generation.
arXiv Detail & Related papers (2026-01-02T13:04:47Z) - MPMAvatar: Learning 3D Gaussian Avatars with Accurate and Robust Physics-Based Dynamics [27.63650397876897]
MPMAvatar is a framework for creating 3D human avatars from multi-view videos.<n>For accurate and robust dynamics modeling, our key idea is to use a Material Point Method-based simulator.
arXiv Detail & Related papers (2025-10-02T02:51:45Z) - Restage4D: Reanimating Deformable 3D Reconstruction from a Single Video [56.781766315691854]
We introduce textbfRestage4D, a geometry-preserving pipeline for video-conditioned 4D restaging.<n>We validate Restage4D on DAVIS and PointOdyssey, demonstrating improved geometry consistency, motion quality, and 3D tracking performance.
arXiv Detail & Related papers (2025-08-08T21:31:51Z) - From Gallery to Wrist: Realistic 3D Bracelet Insertion in Videos [8.444819892052958]
2D diffusion models have shown promise for producing photorealistic edits.<n>Traditional 3D rendering methods excel in spatial and temporal consistency but fall short in achieving photorealistic lighting.<n>This is the first approach to synergize 3D rendering and 2D diffusion for video object insertion.
arXiv Detail & Related papers (2025-07-27T15:49:07Z) - TeGA: Texture Space Gaussian Avatars for High-Resolution Dynamic Head Modeling [52.87836237427514]
Photoreal avatars are seen as a key component in emerging applications in telepresence, extended reality, and entertainment.<n>We present a new high-detail 3D head avatar model that improves upon the state of the art.
arXiv Detail & Related papers (2025-05-08T22:10:27Z) - ReVision: High-Quality, Low-Cost Video Generation with Explicit 3D Physics Modeling for Complex Motion and Interaction [22.420752010237052]
We introduce ReVision, a plug-and-play framework that explicitly integrates parameterized 3D physical knowledge into a conditional video generation model.<n>We validate the effectiveness of our approach on Stable Video Diffusion, where ReVision significantly improves motion fidelity and coherence.<n>Our results suggest that, by incorporating 3D physical knowledge, even a relatively small video diffusion model can generate complex motions and interactions with greater realism and controllability.
arXiv Detail & Related papers (2025-04-30T17:59:56Z) - Deblur-Avatar: Animatable Avatars from Motion-Blurred Monocular Videos [64.10307207290039]
We introduce a novel framework for modeling high-fidelity, animatable 3D human avatars from motion-blurred monocular video inputs.<n>By explicitly modeling human motion trajectories during exposure time, we jointly optimize the trajectories and 3D Gaussians to reconstruct sharp, high-quality human avatars.
arXiv Detail & Related papers (2025-01-23T02:31:57Z) - DeTurb: Atmospheric Turbulence Mitigation with Deformable 3D Convolutions and 3D Swin Transformers [2.9695823613761316]
Atmospheric turbulence in long-range imaging significantly degrades the quality and fidelity captured scenes due to random variations in both spatial and temporal dimensions.
This paper proposes a new framework that combines geometric restoration with an enhancement module.
The proposed framework demonstrates superior performance over the state of the art for both synthetic and real atmospheric turbulence effects, with reasonable speed and model size.
arXiv Detail & Related papers (2024-07-30T14:31:33Z) - UltrAvatar: A Realistic Animatable 3D Avatar Diffusion Model with Authenticity Guided Textures [80.047065473698]
We propose a novel 3D avatar generation approach termed UltrAvatar with enhanced fidelity of geometry, and superior quality of physically based rendering (PBR) textures without unwanted lighting.
We demonstrate the effectiveness and robustness of the proposed method, outperforming the state-of-the-art methods by a large margin in the experiments.
arXiv Detail & Related papers (2024-01-20T01:55:17Z) - SMPLpix: Neural Avatars from 3D Human Models [56.85115800735619]
We bridge the gap between classic rendering and the latest generative networks operating in pixel space.
We train a network that directly converts a sparse set of 3D mesh vertices into photorealistic images.
We show the advantage over conventional differentiables both in terms of the level of photorealism and rendering efficiency.
arXiv Detail & Related papers (2020-08-16T10:22:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.