Spatio-Temporal Distortion Aware Omnidirectional Video Super-Resolution
- URL: http://arxiv.org/abs/2410.11506v2
- Date: Mon, 17 Mar 2025 16:22:15 GMT
- Title: Spatio-Temporal Distortion Aware Omnidirectional Video Super-Resolution
- Authors: Hongyu An, Xinfeng Zhang, Shijie Zhao, Li Zhang, Ruiqin Xiong,
- Abstract summary: Video super-resolution (SR) is proposed to enhance resolution, but practical ODV spatial projection distortions and temporal flickering are not well addressed directly applying existing methods.<n>We propose a Spatio-Temporal Distortion Aware Network (STDAN) oriented to ODV characteristics to achieve better ODV-SR reconstruction.
- Score: 25.615935776826596
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Omnidirectional video (ODV) provides an immersive visual experience and is widely utilized in virtual reality and augmented reality. However, restricted capturing devices and transmission bandwidth lead to low-resolution ODVs. Video super-resolution (SR) is proposed to enhance resolution, but practical ODV spatial projection distortions and temporal flickering are not well addressed directly applying existing methods. To achieve better ODV-SR reconstruction, we propose a Spatio-Temporal Distortion Aware Network (STDAN) oriented to ODV characteristics. Specifically, a spatially continuous distortion modulation module is introduced to improve discrete projection distortions. Next, we design an interlaced multi-frame reconstruction mechanism to refine temporal consistency across frames. Furthermore, we incorporate latitude-saliency adaptive weights during training to concentrate on regions with higher texture complexity and human-watching interest. In general, we explore inference-free and real-world viewing matched strategies to provide an application-friendly method on a novel ODV-SR dataset with practical scenarios. Extensive experimental results demonstrate the superior performance of the proposed STDAN over state-of-the-art methods.
Related papers
- Semantic and Temporal Integration in Latent Diffusion Space for High-Fidelity Video Super-Resolution [20.151571582095468]
We propose Semantic and Temporal Guided Video Super-Resolution (SeTe-VSR)<n>Our approach achieves a seamless balance between recovering intricate details and ensuring temporal coherence.<n>Our method not only preserves high-reality visual content but also significantly enhances fidelity.
arXiv Detail & Related papers (2025-08-01T09:47:35Z) - UltraVSR: Achieving Ultra-Realistic Video Super-Resolution with Efficient One-Step Diffusion Space [46.43409853027655]
Diffusion models have shown great potential in generating realistic image detail.<n>Adapting these models to video super-resolution (VSR) remains challenging due to their inherentity and lack of temporal modeling.<n>We propose UltraVSR, a novel framework that enables ultra-realistic and temporally-coherent VSR through an efficient one-step diffusion space.
arXiv Detail & Related papers (2025-05-26T13:19:27Z) - DiVE: Efficient Multi-View Driving Scenes Generation Based on Video Diffusion Transformer [56.98400572837792]
DiVE produces high-fidelity, temporally coherent, and cross-view consistent multi-view videos.<n>These innovations collectively achieve a 2.62x speedup with minimal quality degradation.
arXiv Detail & Related papers (2025-04-28T09:20:50Z) - Event-Enhanced Blurry Video Super-Resolution [52.894824081586776]
We tackle the task of blurry video super-resolution (BVSR), aiming to generate high-resolution (HR) videos from low-resolution (LR) and blurry inputs.
Current BVSR methods often fail to restore sharp details at high resolutions, resulting in noticeable artifacts and jitter.
We introduce event signals into BVSR and propose a novel event-enhanced network, Ev-DeVSR.
arXiv Detail & Related papers (2025-04-17T15:55:41Z) - Temporal-Consistent Video Restoration with Pre-trained Diffusion Models [51.47188802535954]
Video restoration (VR) aims to recover high-quality videos from degraded ones.<n>Recent zero-shot VR methods using pre-trained diffusion models (DMs) suffer from approximation errors during reverse diffusion and insufficient temporal consistency.<n>We present a novel a Posterior Maximum (MAP) framework that directly parameterizes video frames in the seed space of DMs, eliminating approximation errors.
arXiv Detail & Related papers (2025-03-19T03:41:56Z) - SeedVR: Seeding Infinity in Diffusion Transformer Towards Generic Video Restoration [73.70209718408641]
SeedVR is a diffusion transformer designed to handle real-world video restoration with arbitrary length and resolution.<n>It achieves highly-competitive performance on both synthetic and real-world benchmarks, as well as AI-generated videos.
arXiv Detail & Related papers (2025-01-02T16:19:48Z) - Collaborative Feedback Discriminative Propagation for Video Super-Resolution [66.61201445650323]
Key success of video super-resolution (VSR) methods stems mainly from exploring spatial and temporal information.
Inaccurate alignment usually leads to aligned features with significant artifacts.
propagation modules only propagate the same timestep features forward or backward.
arXiv Detail & Related papers (2024-04-06T22:08:20Z) - Learning Spatial Adaptation and Temporal Coherence in Diffusion Models for Video Super-Resolution [151.1255837803585]
We propose a novel approach, pursuing Spatial Adaptation and Temporal Coherence (SATeCo) for video super-resolution.
SATeCo pivots on learning spatial-temporal guidance from low-resolution videos to calibrate both latent-space high-resolution video denoising and pixel-space video reconstruction.
Experiments conducted on the REDS4 and Vid4 datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2024-03-25T17:59:26Z) - Video Dynamics Prior: An Internal Learning Approach for Robust Video
Enhancements [83.5820690348833]
We present a framework for low-level vision tasks that does not require any external training data corpus.
Our approach learns neural modules by optimizing over a corrupted sequence, leveraging the weights of the coherence-temporal test and statistics internal statistics.
arXiv Detail & Related papers (2023-12-13T01:57:11Z) - Motion-Guided Latent Diffusion for Temporally Consistent Real-world Video Super-resolution [15.197746480157651]
We propose an effective real-world VSR algorithm by leveraging the strength of pre-trained latent diffusion models.
We exploit the temporal dynamics in LR videos to guide the diffusion process by optimizing the latent sampling path with a motion-guided loss.
The proposed motion-guided latent diffusion based VSR algorithm achieves significantly better perceptual quality than state-of-the-arts on real-world VSR benchmark datasets.
arXiv Detail & Related papers (2023-12-01T14:40:07Z) - Benchmark Dataset and Effective Inter-Frame Alignment for Real-World
Video Super-Resolution [65.20905703823965]
Video super-resolution (VSR) aiming to reconstruct a high-resolution (HR) video from its low-resolution (LR) counterpart has made tremendous progress in recent years.
It remains challenging to deploy existing VSR methods to real-world data with complex degradations.
EAVSR takes the proposed multi-layer adaptive spatial transform network (MultiAdaSTN) to refine the offsets provided by the pre-trained optical flow estimation network.
arXiv Detail & Related papers (2022-12-10T17:41:46Z) - H2-Stereo: High-Speed, High-Resolution Stereoscopic Video System [39.95458608416292]
High-resolution stereoscopic (H2-Stereo) video allows us to perceive dynamic 3D content fine.
Existing methods provide compromised solutions that lack temporal or spatial details.
We propose a dual camera system, in which one captures high-spatial-resolution low-frame-rate (HSR-LFR) videos with rich spatial details.
We then devise a Learned Information Fusion network (LIFnet) that exploits the cross-camera redundancies to reconstruct the H2-Stereo video effectively.
arXiv Detail & Related papers (2022-08-04T04:06:01Z) - Learned Video Compression via Heterogeneous Deformable Compensation
Network [78.72508633457392]
We propose a learned video compression framework via heterogeneous deformable compensation strategy (HDCVC) to tackle the problems of unstable compression performance.
More specifically, the proposed algorithm extracts features from the two adjacent frames to estimate content-Neighborhood heterogeneous deformable (HetDeform) kernel offsets.
Experimental results indicate that HDCVC achieves superior performance than the recent state-of-the-art learned video compression approaches.
arXiv Detail & Related papers (2022-07-11T02:31:31Z) - STRPM: A Spatiotemporal Residual Predictive Model for High-Resolution
Video Prediction [78.129039340528]
We propose a StemporalResidual Predictive Model (STRPM) for high-resolution video prediction.
STRPM can generate more satisfactory results compared with various existing methods.
Experimental results show that STRPM can generate more satisfactory results compared with various existing methods.
arXiv Detail & Related papers (2022-03-30T06:24:00Z) - Fast Online Video Super-Resolution with Deformable Attention Pyramid [172.16491820970646]
Video super-resolution (VSR) has many applications that pose strict causal, real-time, and latency constraints, including video streaming and TV.
We propose a recurrent VSR architecture based on a deformable attention pyramid (DAP)
arXiv Detail & Related papers (2022-02-03T17:49:04Z) - Video Face Super-Resolution with Motion-Adaptive Feedback Cell [90.73821618795512]
Video super-resolution (VSR) methods have recently achieved a remarkable success due to the development of deep convolutional neural networks (CNN)
In this paper, we propose a Motion-Adaptive Feedback Cell (MAFC), a simple but effective block, which can efficiently capture the motion compensation and feed it back to the network in an adaptive way.
arXiv Detail & Related papers (2020-02-15T13:14:10Z) - End-To-End Trainable Video Super-Resolution Based on a New Mechanism for
Implicit Motion Estimation and Compensation [19.67999205691758]
Video super-resolution aims at generating a high-resolution video from its low-resolution counterpart.
We propose a novel dynamic local filter network to perform implicit motion estimation and compensation.
We also propose a global refinement network based on ResBlock and autoencoder structures.
arXiv Detail & Related papers (2020-01-05T03:47:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.