Wavelet-Domain Masked Image Modeling for Color-Consistent HDR Video Reconstruction
- URL: http://arxiv.org/abs/2602.07393v1
- Date: Sat, 07 Feb 2026 06:19:23 GMT
- Title: Wavelet-Domain Masked Image Modeling for Color-Consistent HDR Video Reconstruction
- Authors: Yang Zhang, Zhangkai Ni, Wenhan Yang, Hanli Wang,
- Abstract summary: High Dynamic Range (LDR) video reconstruction aims to recover fine brightness, color, and details from LDR videos.<n>Existing methods often suffer from color inaccuracies and temporal inconsistencies.<n>We propose WMNet, a novel HDR video reconstruction network that leverages Wavelet domain Masked Image Modeling.
- Score: 69.35623794013152
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: High Dynamic Range (HDR) video reconstruction aims to recover fine brightness, color, and details from Low Dynamic Range (LDR) videos. However, existing methods often suffer from color inaccuracies and temporal inconsistencies. To address these challenges, we propose WMNet, a novel HDR video reconstruction network that leverages Wavelet domain Masked Image Modeling (W-MIM). WMNet adopts a two-phase training strategy: In Phase I, W-MIM performs self-reconstruction pre-training by selectively masking color and detail information in the wavelet domain, enabling the network to develop robust color restoration capabilities. A curriculum learning scheme further refines the reconstruction process. Phase II fine-tunes the model using the pre-trained weights to improve the final reconstruction quality. To improve temporal consistency, we introduce the Temporal Mixture of Experts (T-MoE) module and the Dynamic Memory Module (DMM). T-MoE adaptively fuses adjacent frames to reduce flickering artifacts, while DMM captures long-range dependencies, ensuring smooth motion and preservation of fine details. Additionally, since existing HDR video datasets lack scene-based segmentation, we reorganize HDRTV4K into HDRTV4K-Scene, establishing a new benchmark for HDR video reconstruction. Extensive experiments demonstrate that WMNet achieves state-of-the-art performance across multiple evaluation metrics, significantly improving color fidelity, temporal coherence, and perceptual quality. The code is available at: https://github.com/eezkni/WMNet
Related papers
- Reconstructing 3D Scenes in Native High Dynamic Range [82.90064638813185]
We present the first method for 3D scene reconstruction that directly models native HDR observations.<n>We propose bf Native High dynamic range 3D Gaussian Splatting (NH-3DGS), which preserves the full dynamic range throughout the reconstruction pipeline.<n>We demonstrate on both synthetic and real multi-view HDR datasets that NH-3DGS significantly outperforms existing methods in reconstruction quality and dynamic range preservation.
arXiv Detail & Related papers (2025-11-17T02:33:31Z) - Modulo Video Recovery via Selective Spatiotemporal Vision Transformer [33.84336417728034]
We present the first deep learning framework for modulo video reconstruction.<n>SSViT employs a token selection strategy to improve efficiency and concentrate on the most critical regions.<n> Experiments confirm that SSViT produces high-quality reconstructions from 8-bit folded videos.
arXiv Detail & Related papers (2025-11-09T12:54:32Z) - Generating Content for HDR Deghosting from Frequency View [56.103761824603644]
Recent Diffusion Models (DMs) have been introduced in HDR imaging field.
DMs require extensive iterations with large models to estimate entire images.
We propose the Low-Frequency aware Diffusion (LF-Diff) model for ghost-free HDR imaging.
arXiv Detail & Related papers (2024-04-01T01:32:11Z) - Self-Supervised High Dynamic Range Imaging with Multi-Exposure Images in
Dynamic Scenes [58.66427721308464]
Self is a self-supervised reconstruction method that only requires dynamic multi-exposure images during training.
Self achieves superior results against the state-of-the-art self-supervised methods, and comparable performance to supervised ones.
arXiv Detail & Related papers (2023-10-03T07:10:49Z) - LAN-HDR: Luminance-based Alignment Network for High Dynamic Range Video
Reconstruction [20.911738532410766]
We propose an end-to-end HDR video composition framework, which aligns LDR frames in feature space and then merges aligned features into an HDR frame.
In training, we adopt a temporal loss, in addition to frame reconstruction losses, to enhance temporal consistency and thus reduce flickering.
arXiv Detail & Related papers (2023-08-22T01:43:00Z) - SMAE: Few-shot Learning for HDR Deghosting with Saturation-Aware Masked
Autoencoders [97.64072440883392]
We propose a novel semi-supervised approach to realize few-shot HDR imaging via two stages of training, called SSHDR.
Unlikely previous methods, directly recovering content and removing ghosts simultaneously, which is hard to achieve optimum.
Experiments demonstrate that SSHDR outperforms state-of-the-art methods quantitatively and qualitatively within and across different datasets.
arXiv Detail & Related papers (2023-04-14T03:42:51Z) - Deep Progressive Feature Aggregation Network for High Dynamic Range
Imaging [24.94466716276423]
We propose a deep progressive feature aggregation network for improving HDR imaging quality in dynamic scenes.
Our method implicitly samples high-correspondence features and aggregates them in a coarse-to-fine manner for alignment.
Experiments show that our proposed method can achieve state-of-the-art performance under different scenes.
arXiv Detail & Related papers (2022-08-04T04:37:35Z) - Learning Dynamic View Synthesis With Few RGBD Cameras [60.36357774688289]
We propose to utilize RGBD cameras to synthesize free-viewpoint videos of dynamic indoor scenes.
We generate point clouds from RGBD frames and then render them into free-viewpoint videos via a neural feature.
We introduce a simple Regional Depth-Inpainting module that adaptively inpaints missing depth values to render complete novel views.
arXiv Detail & Related papers (2022-04-22T03:17:35Z) - HDR Reconstruction from Bracketed Exposures and Events [12.565039752529797]
Reconstruction of high-quality HDR images is at the core of modern computational photography.
We present a multi-modal end-to-end learning-based HDR imaging system that fuses bracketed images and event in the feature domain.
Our framework exploits the higher temporal resolution of events by sub-sampling the input event streams using a sliding window.
arXiv Detail & Related papers (2022-03-28T15:04:41Z) - HDRUNet: Single Image HDR Reconstruction with Denoising and
Dequantization [39.82945546614887]
We propose a novel learning-based approach using a spatially dynamic encoder-decoder network, HDRUNet, to learn an end-to-end mapping for single image HDR reconstruction.
Our method achieves the state-of-the-art performance in quantitative comparisons and visual quality.
arXiv Detail & Related papers (2021-05-27T12:12:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.