Related papers: ESCA: Enabling Seamless Codec Avatar Execution through Algorithm and Hardware Co-Optimization for Virtual Reality

ESCA: Enabling Seamless Codec Avatar Execution through Algorithm and Hardware Co-Optimization for Virtual Reality

URL: http://arxiv.org/abs/2510.24787v1
Date: Mon, 27 Oct 2025 02:31:20 GMT
Title: ESCA: Enabling Seamless Codec Avatar Execution through Algorithm and Hardware Co-Optimization for Virtual Reality
Authors: Mingzhi Zhu, Ding Shang, Sai Qian Zhang,
Abstract summary: Photo Codec Avatars (PCAs) generate high-fidelity human face renderings for Virtual Reality (VR) environments.<n>We propose an efficient post-training quantization (PTQ) method tailored for Codec Avatar models, enabling low-precision execution without compromising output quality.<n>We introduce ESCA, a full-stack optimization framework that accelerates PCA inference on edge VR platforms.
Score: 8.437724028285682
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Photorealistic Codec Avatars (PCA), which generate high-fidelity human face renderings, are increasingly being used in Virtual Reality (VR) environments to enable immersive communication and interaction through deep learning-based generative models. However, these models impose significant computational demands, making real-time inference challenging on resource-constrained VR devices such as head-mounted displays, where latency and power efficiency are critical. To address this challenge, we propose an efficient post-training quantization (PTQ) method tailored for Codec Avatar models, enabling low-precision execution without compromising output quality. In addition, we design a custom hardware accelerator that can be integrated into the system-on-chip of VR devices to further enhance processing efficiency. Building on these components, we introduce ESCA, a full-stack optimization framework that accelerates PCA inference on edge VR platforms. Experimental results demonstrate that ESCA boosts FovVideoVDP quality scores by up to $+0.39$ over the best 4-bit baseline, delivers up to $3.36\times$ latency reduction, and sustains a rendering rate of 100 frames per second in end-to-end tests, satisfying real-time VR requirements. These results demonstrate the feasibility of deploying high-fidelity codec avatars on resource-constrained devices, opening the door to more immersive and portable VR experiences.

Related papers

PocketSR: The Super-Resolution Expert in Your Pocket Mobiles [69.26751136689533]
Real-world image super-resolution (RealSR) aims to enhance the visual quality of in-the-wild images, such as those captured by mobile phones.<n>Existing methods leveraging large generative models demonstrate impressive results, but the high computational cost and latency make them impractical for edge deployment.<n>We introduce PocketSR, an ultra-lightweight, single-step model that brings generative modeling capabilities to RealSR while maintaining high fidelity.
arXiv Detail & Related papers (2025-10-03T13:56:18Z)
VRScout: Towards Real-Time, Autonomous Testing of Virtual Reality Games [3.4662962824313865]
We present VRScout, a deep learning-based agent capable of autonomously navigating VR environments and interacting with virtual objects in a human-like and real-time manner.<n>We show that VRScout achieves expert-level performance with only limited training data, while maintaining real-time inference at 60 FPS on consumer-grade hardware.<n>These results position VRScout as a practical and scalable framework for automated VR game testing, with direct applications in both quality assurance and safety auditing.
arXiv Detail & Related papers (2025-09-18T09:16:05Z)
GazeProphet: Software-Only Gaze Prediction for VR Foveated Rendering [0.0]
Foveated rendering significantly reduces computational demands in virtual reality applications.<n>Current approaches require expensive hardware-based eye tracking systems.<n>This paper presents GazeProphet, a software-only approach for predicting gaze locations in VR environments.
arXiv Detail & Related papers (2025-08-19T06:09:23Z)
SeedVR2: One-Step Video Restoration via Diffusion Adversarial Post-Training [82.68200031146299]
We propose a one-step diffusion-based VR model, termed as SeedVR2, which performs adversarial VR training against real data.<n>To handle the challenging high-resolution VR within a single step, we introduce several enhancements to both model architecture and training procedures.
arXiv Detail & Related papers (2025-06-05T17:51:05Z)
VRSplat: Fast and Robust Gaussian Splatting for Virtual Reality [47.738522999465864]
We introduce VRSplat: we combine and extend several recent advancements in 3DGS to address challenges of VR holistically.<n> VRSplat is the first, systematically evaluated 3DGS approach capable of supporting modern VR applications, achieving 72+ FPS while eliminating popping and stereo-disrupting floaters.
arXiv Detail & Related papers (2025-05-15T10:17:48Z)
VR-Splatting: Foveated Radiance Field Rendering via 3D Gaussian Splatting and Neural Points [4.962171160815189]
We propose a novel hybrid approach that combines the strengths of both point rendering directions regarding performance sweet spots.<n>For the fovea only, we use neural points with a convolutional neural network for the small pixel footprint, which provides sharp, detailed output.<n>Our evaluation confirms that our approach increases sharpness and details compared to a standard VR-ready 3DGS configuration.
arXiv Detail & Related papers (2024-10-23T14:54:48Z)
Coral Model Generation from Single Images for Virtual Reality Applications [22.18438294137604]
This paper introduces a deep-learning framework that generates high-precision 3D coral models from a single image. The project incorporates Explainable AI (XAI) to transform AI-generated models into interactive "artworks"
arXiv Detail & Related papers (2024-09-04T01:54:20Z)
VR-GS: A Physical Dynamics-Aware Interactive Gaussian Splatting System in Virtual Reality [39.53150683721031]
Our proposed VR-GS system represents a leap forward in human-centered 3D content interaction. The components of our Virtual Reality system are designed for high efficiency and effectiveness.
arXiv Detail & Related papers (2024-01-30T01:28:36Z)
VR-NeRF: High-Fidelity Virtualized Walkable Spaces [55.51127858816994]
We present an end-to-end system for the high-fidelity capture, model reconstruction, and real-time rendering of walkable spaces in virtual reality using neural radiance fields.
arXiv Detail & Related papers (2023-11-05T02:03:14Z)
Wireless Edge-Empowered Metaverse: A Learning-Based Incentive Mechanism for Virtual Reality [102.4151387131726]
We propose a learning-based Incentive Mechanism framework for VR services in the Metaverse. First, we propose the quality of perception as the metric for VR users in the virtual world. Second, for quick trading of VR services between VR users (i.e., buyers) and VR SPs (i.e., sellers), we design a double Dutch auction mechanism. Third, for auction communication reduction, we design a deep reinforcement learning-based auctioneer to accelerate this auction process.
arXiv Detail & Related papers (2021-11-07T13:02:52Z)
Unmasking Communication Partners: A Low-Cost AI Solution for Digitally Removing Head-Mounted Displays in VR-Based Telepresence [62.997667081978825]
Face-to-face conversation in Virtual Reality (VR) is a challenge when participants wear head-mounted displays (HMD) Past research has shown that high-fidelity face reconstruction with personal avatars in VR is possible under laboratory conditions with high-cost hardware. We propose one of the first low-cost systems for this task which uses only open source, free software and affordable hardware.
arXiv Detail & Related papers (2020-11-06T23:17:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.