Modulo Video Recovery via Selective Spatiotemporal Vision Transformer
- URL: http://arxiv.org/abs/2511.07479v1
- Date: Wed, 12 Nov 2025 01:01:16 GMT
- Title: Modulo Video Recovery via Selective Spatiotemporal Vision Transformer
- Authors: Tianyu Geng, Feng Ji, Wee Peng Tay,
- Abstract summary: We present the first deep learning framework for modulo video reconstruction.<n>SSViT employs a token selection strategy to improve efficiency and concentrate on the most critical regions.<n> Experiments confirm that SSViT produces high-quality reconstructions from 8-bit folded videos.
- Score: 33.84336417728034
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Conventional image sensors have limited dynamic range, causing saturation in high-dynamic-range (HDR) scenes. Modulo cameras address this by folding incident irradiance into a bounded range, yet require specialized unwrapping algorithms to reconstruct the underlying signal. Unlike HDR recovery, which extends dynamic range from conventional sampling, modulo recovery restores actual values from folded samples. Despite being introduced over a decade ago, progress in modulo image recovery has been slow, especially in the use of modern deep learning techniques. In this work, we demonstrate that standard HDR methods are unsuitable for modulo recovery. Transformers, however, can capture global dependencies and spatial-temporal relationships crucial for resolving folded video frames. Still, adapting existing Transformer architectures for modulo recovery demands novel techniques. To this end, we present Selective Spatiotemporal Vision Transformer (SSViT), the first deep learning framework for modulo video reconstruction. SSViT employs a token selection strategy to improve efficiency and concentrate on the most critical regions. Experiments confirm that SSViT produces high-quality reconstructions from 8-bit folded videos and achieves state-of-the-art performance in modulo video recovery.
Related papers
- Wavelet-Domain Masked Image Modeling for Color-Consistent HDR Video Reconstruction [69.35623794013152]
High Dynamic Range (LDR) video reconstruction aims to recover fine brightness, color, and details from LDR videos.<n>Existing methods often suffer from color inaccuracies and temporal inconsistencies.<n>We propose WMNet, a novel HDR video reconstruction network that leverages Wavelet domain Masked Image Modeling.
arXiv Detail & Related papers (2026-02-07T06:19:23Z) - Deep Lightweight Unrolled Network for High Dynamic Range Modulo Imaging [19.49437461280304]
Modulo-Imaging (MI) offers a promising alternative for expanding the dynamic dynamic range images by resetting the signal intensity when it reaches the intensity level.<n>We introduce the Scaling Equi term that facilitates self-tuning, thereby enabling the model to adapt to new images outside the original distribution.
arXiv Detail & Related papers (2026-01-18T18:22:38Z) - All-in-One Video Restoration under Smoothly Evolving Unknown Weather Degradations [102.94052335735326]
All-in-one image restoration aims to recover clean images from diverse unknown degradations using a single model.<n>Existing approaches primarily focus on frame-wise degradation variation, overlooking the temporal continuity that naturally exists in real-world degradation processes.<n>We introduce the Smoothly Evolving Unknown Degradations (SEUD) scenario, where both the active degradation set and degradation intensity change continuously over time.
arXiv Detail & Related papers (2026-01-02T02:20:57Z) - Progressive Image Restoration via Text-Conditioned Video Generation [6.1671530509662205]
Text-to-video models have demonstrated strong temporal generation capabilities, yet their potential for image restoration remains underexplored.<n>In this work, we repurpose CogVideo for progressive visual restoration tasks by fine-tuning it to generate restoration trajectories rather than natural video motion.<n>We construct synthetic datasets for super-resolution, deblurring, and low-light enhancement, where each sample depicts a gradual transition from degraded to clean frames.<n>Our model learns to associate temporal progression with restoration quality, producing sequences that improve perceptual metrics such as PSNR, SSIM, and LPIPS across frames.
arXiv Detail & Related papers (2025-12-01T23:37:51Z) - From Attention to Frequency: Integration of Vision Transformer and FFT-ReLU for Enhanced Image Deblurring [0.9728664856449597]
We propose a new dual-domain architecture that unifies Vision Transformers with a frequency-domain FFT-ReLU module.<n>In this structure, the ViT backbone captures local and global dependencies, while the FFT-ReLU component enforces frequency-domain sparsity to suppress blur-related artifacts.<n>Experiments on benchmark datasets demonstrate that this architecture achieves superior PSNR, SSIM, and perceptual quality compared to state-of-the-art models.
arXiv Detail & Related papers (2025-11-13T21:19:57Z) - LVTINO: LAtent Video consisTency INverse sOlver for High Definition Video Restoration [3.2944592608677614]
We propose LVTINO, the first zero-shot or plug-and-play inverse solver for high definition video restoration with priors encoded by VCMs.<n>Our conditioning mechanism bypasses the need for automatic differentiation and achieves state-of-the-art video reconstruction quality with only a few neural function evaluations.
arXiv Detail & Related papers (2025-10-01T18:10:08Z) - FCA2: Frame Compression-Aware Autoencoder for Modular and Fast Compressed Video Super-Resolution [68.77813885751308]
State-of-the-art (SOTA) compressed video super-resolution (CVSR) models face persistent challenges, including prolonged inference time, complex training pipelines, and reliance on auxiliary information.<n>We propose an efficient and scalable solution inspired by the structural and statistical similarities between hyperspectral images (HSI) and video data.<n>Our approach introduces a compression-driven dimensionality reduction strategy that reduces computational complexity, accelerates inference, and enhances the extraction of temporal information across frames.
arXiv Detail & Related papers (2025-06-13T07:59:52Z) - UltraVSR: Achieving Ultra-Realistic Video Super-Resolution with Efficient One-Step Diffusion Space [46.43409853027655]
Diffusion models have shown great potential in generating realistic image detail.<n>Adapting these models to video super-resolution (VSR) remains challenging due to their inherentity and lack of temporal modeling.<n>We propose UltraVSR, a novel framework that enables ultra-realistic and temporally-coherent VSR through an efficient one-step diffusion space.
arXiv Detail & Related papers (2025-05-26T13:19:27Z) - Reversing the Damage: A QP-Aware Transformer-Diffusion Approach for 8K Video Restoration under Codec Compression [0.0]
DiQP is a novel Transformer-Diffusion model for restoring 8K video quality degraded by compression.<n>Our architecture combines the power of Transformers to capture long-range dependencies with an enhanced windowed mechanism.<n>Our model outperforms state-of-the-art methods, particularly for high-resolution videos such as 4K and 8K.
arXiv Detail & Related papers (2024-12-12T03:49:22Z) - LAN-HDR: Luminance-based Alignment Network for High Dynamic Range Video
Reconstruction [20.911738532410766]
We propose an end-to-end HDR video composition framework, which aligns LDR frames in feature space and then merges aligned features into an HDR frame.
In training, we adopt a temporal loss, in addition to frame reconstruction losses, to enhance temporal consistency and thus reduce flickering.
arXiv Detail & Related papers (2023-08-22T01:43:00Z) - SceNeRFlow: Time-Consistent Reconstruction of General Dynamic Scenes [75.9110646062442]
We propose SceNeRFlow to reconstruct a general, non-rigid scene in a time-consistent manner.
Our method takes multi-view RGB videos and background images from static cameras with known camera parameters as input.
We show experimentally that, unlike prior work that only handles small motion, our method enables the reconstruction of studio-scale motions.
arXiv Detail & Related papers (2023-08-16T09:50:35Z) - Learning Trajectory-Aware Transformer for Video Super-Resolution [50.49396123016185]
Video super-resolution aims to restore a sequence of high-resolution (HR) frames from their low-resolution (LR) counterparts.
Existing approaches usually align and aggregate video frames from limited adjacent frames.
We propose a novel Transformer for Video Super-Resolution (TTVSR)
arXiv Detail & Related papers (2022-04-08T03:37:39Z) - BasicVSR++: Improving Video Super-Resolution with Enhanced Propagation
and Alignment [90.81396836308085]
We show that by empowering recurrent framework with enhanced propagation and alignment, one can exploit video information more effectively.
Our model BasicVSR++ surpasses BasicVSR by 0.82 dB in PSNR with similar number of parameters.
BasicVSR++ generalizes well to other video restoration tasks such as compressed video enhancement.
arXiv Detail & Related papers (2021-04-27T17:58:31Z) - Zooming SlowMo: An Efficient One-Stage Framework for Space-Time Video
Super-Resolution [100.11355888909102]
Space-time video super-resolution aims at generating a high-resolution (HR) slow-motion video from a low-resolution (LR) and low frame rate (LFR) video sequence.
We present a one-stage space-time video super-resolution framework, which can directly reconstruct an HR slow-motion video sequence from an input LR and LFR video.
arXiv Detail & Related papers (2021-04-15T17:59:23Z) - Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video
Super-Resolution [95.26202278535543]
A simple solution is to split it into two sub-tasks: video frame (VFI) and video super-resolution (VSR)
temporalsynthesis and spatial super-resolution are intra-related in this task.
We propose a one-stage space-time video super-resolution framework, which directly synthesizes an HR slow-motion video from an LFR, LR video.
arXiv Detail & Related papers (2020-02-26T16:59:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.