Related papers: High-Speed FHD Full-Color Video Computer-Generated Holography

High-Speed FHD Full-Color Video Computer-Generated Holography

URL: http://arxiv.org/abs/2508.19579v1
Date: Wed, 27 Aug 2025 05:24:37 GMT
Title: High-Speed FHD Full-Color Video Computer-Generated Holography
Authors: Haomiao Zhang, Miao Cao, Xuan Yu, Hui Luo, Yanling Piao, Mengjie Qin, Zhangyuan Li, Ping Wang, Xin Yuan,
Abstract summary: Holography is a promising technology for next-generation displays.<n> generating high-speed, high-quality holographic video requires both high frame rate display and efficient computation.
Score: 13.302001362328134
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Computer-generated holography (CGH) is a promising technology for next-generation displays. However, generating high-speed, high-quality holographic video requires both high frame rate display and efficient computation, but is constrained by two key limitations: ($i$) Learning-based models often produce over-smoothed phases with narrow angular spectra, causing severe color crosstalk in high frame rate full-color displays such as depth-division multiplexing and thus resulting in a trade-off between frame rate and color fidelity. ($ii$) Existing frame-by-frame optimization methods typically optimize frames independently, neglecting spatial-temporal correlations between consecutive frames and leading to computationally inefficient solutions. To overcome these challenges, in this paper, we propose a novel high-speed full-color video CGH generation scheme. First, we introduce Spectrum-Guided Depth Division Multiplexing (SGDDM), which optimizes phase distributions via frequency modulation, enabling high-fidelity full-color display at high frame rates. Second, we present HoloMamba, a lightweight asymmetric Mamba-Unet architecture that explicitly models spatial-temporal correlations across video sequences to enhance reconstruction quality and computational efficiency. Extensive simulated and real-world experiments demonstrate that SGDDM achieves high-fidelity full-color display without compromise in frame rate, while HoloMamba generates FHD (1080p) full-color holographic video at over 260 FPS, more than 2.6$\times$ faster than the prior state-of-the-art Divide-Conquer-and-Merge Strategy.

Related papers

Towards Holistic Modeling for Video Frame Interpolation with Auto-regressive Diffusion Transformers [95.68243351895107]
We propose a holistic, video-centric paradigm named textbfLocal textbfDiffusion textbfForcing for textbfVideo textbfFrame textbfInterpolation (LDF-VFI)<n>Our framework is built upon an auto-regressive diffusion transformer that models the entire video sequence to ensure long-range temporal coherence.<n>LDF-VFI achieves state-of-the-art performance on challenging long-sequence benchmarks, demonstrating superior per
arXiv Detail & Related papers (2026-01-21T12:58:52Z)
StableDPT: Temporal Stable Monocular Video Depth Estimation [14.453483279783908]
We propose a novel approach that adapts any state-of-the-art image-based (depth) estimation model for video processing.<n>Our architecture builds upon an off-the-shelf Vision Transformer (ViT) encoder and enhances the Dense Prediction Transformer (DPT) head.<n> Evaluations on multiple benchmark datasets demonstrate improved temporal consistency, competitive state-of-the-art performance and on top 2x faster processing in real-world scenarios.
arXiv Detail & Related papers (2026-01-06T08:02:14Z)
STORM: Token-Efficient Long Video Understanding for Multimodal LLMs [116.4479155699528]
STORM is a novel architecture incorporating a dedicated temporal encoder between the image encoder and the Video-LLMs.<n>We show that STORM achieves state-of-the-art results across various long video understanding benchmarks.
arXiv Detail & Related papers (2025-03-06T06:17:38Z)
VISION-XL: High Definition Video Inverse Problem Solver using Latent Image Diffusion Models [58.464465016269614]
We propose a novel framework for solving high-definition video inverse problems using latent image diffusion models.<n>Our approach delivers HD-resolution reconstructions in under 6 seconds per frame on a single NVIDIA 4090 GPU.
arXiv Detail & Related papers (2024-11-29T08:10:49Z)
ViBiDSampler: Enhancing Video Interpolation Using Bidirectional Diffusion Sampler [53.98558445900626]
Current image-to-video diffusion models, while powerful in generating videos from a single frame, need adaptation for two-frame conditioned generation.<n>We introduce a novel, bidirectional sampling strategy to address these off-manifold issues without requiring extensive re-noising or fine-tuning.<n>Our method employs sequential sampling along both forward and backward paths, conditioned on the start and end frames, respectively, ensuring more coherent and on-manifold generation of intermediate frames.
arXiv Detail & Related papers (2024-10-08T03:01:54Z)
Ada-VE: Training-Free Consistent Video Editing Using Adaptive Motion Prior [13.595032265551184]
Video-to-video synthesis poses significant challenges in maintaining character consistency, smooth temporal transitions, and preserving visual quality during fast motion. We propose an adaptive motion-guided cross-frame attention mechanism that selectively reduces redundant computations. This enables a greater number of cross-frame attentions over more frames within the same computational budget.
arXiv Detail & Related papers (2024-06-07T12:12:25Z)
RAVEN: Rethinking Adversarial Video Generation with Efficient Tri-plane Networks [93.18404922542702]
We present a novel video generative model designed to address long-term spatial and temporal dependencies. Our approach incorporates a hybrid explicit-implicit tri-plane representation inspired by 3D-aware generative frameworks. Our model synthesizes high-fidelity video clips at a resolution of $256times256$ pixels, with durations extending to more than $5$ seconds at a frame rate of 30 fps.
arXiv Detail & Related papers (2024-01-11T16:48:44Z)
Context-Aware Video Reconstruction for Rolling Shutter Cameras [52.28710992548282]
In this paper, we propose a context-aware GS video reconstruction architecture. We first estimate the bilateral motion field so that the pixels of the two RS frames are warped to a common GS frame. Then, a refinement scheme is proposed to guide the GS frame synthesis along with bilateral occlusion masks to produce high-fidelity GS video frames.
arXiv Detail & Related papers (2022-05-25T17:05:47Z)
Time-multiplexed Neural Holography: A flexible framework for holographic near-eye displays with fast heavily-quantized spatial light modulators [44.73608798155336]
Holographic near-eye displays offer unprecedented capabilities for virtual and augmented reality systems. We report advances in camera-calibrated wave propagation models for these types of holographic near-eye displays. Our framework is flexible in supporting runtime supervision with different types of content, including 2D and 2.5D RGBD images, 3D focal stacks, and 4D light fields.
arXiv Detail & Related papers (2022-05-05T00:03:50Z)
VCGAN: Video Colorization with Hybrid Generative Adversarial Network [22.45196398040388]
Hybrid Video Colorization with Hybrid Generative Adversarative Network (VCGAN) is an improved approach to colorization using end-to-end learning. Experimental results demonstrate that VCGAN produces higher-quality and temporally more consistent colorful videos than existing approaches.
arXiv Detail & Related papers (2021-04-26T05:50:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.