A New Dataset and Transformer for Stereoscopic Video Super-Resolution
- URL: http://arxiv.org/abs/2204.10039v1
- Date: Thu, 21 Apr 2022 11:49:29 GMT
- Title: A New Dataset and Transformer for Stereoscopic Video Super-Resolution
- Authors: Hassan Imani, Md Baharul Islam, Lai-Kuan Wong
- Abstract summary: Stereo video super-resolution aims to enhance the resolution of the low-resolution by reconstructing the high-resolution video.
Key challenges in SVSR are preserving the stereo-consistency and temporal-consistency, without which viewers may experience 3D fatigue.
In this paper, we propose a novel Transformer-based model for SVSR, namely Trans-SVSR.
- Score: 4.332879001008757
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Stereo video super-resolution (SVSR) aims to enhance the spatial resolution
of the low-resolution video by reconstructing the high-resolution video. The
key challenges in SVSR are preserving the stereo-consistency and
temporal-consistency, without which viewers may experience 3D fatigue. There
are several notable works on stereoscopic image super-resolution, but there is
little research on stereo video super-resolution. In this paper, we propose a
novel Transformer-based model for SVSR, namely Trans-SVSR. Trans-SVSR comprises
two key novel components: a spatio-temporal convolutional self-attention layer
and an optical flow-based feed-forward layer that discovers the correlation
across different video frames and aligns the features. The parallax attention
mechanism (PAM) that uses the cross-view information to consider the
significant disparities is used to fuse the stereo views. Due to the lack of a
benchmark dataset suitable for the SVSR task, we collected a new stereoscopic
video dataset, SVSR-Set, containing 71 full high-definition (HD) stereo videos
captured using a professional stereo camera. Extensive experiments on the
collected dataset, along with two other datasets, demonstrate that the
Trans-SVSR can achieve competitive performance compared to the state-of-the-art
methods. Project code and additional results are available at
https://github.com/H-deep/Trans-SVSR/
Related papers
- Sync from the Sea: Retrieving Alignable Videos from Large-Scale Datasets [62.280729345770936]
We introduce the task of Alignable Video Retrieval (AVR)
Given a query video, our approach can identify well-alignable videos from a large collection of clips and temporally synchronize them to the query.
Our experiments on 3 datasets, including large-scale Kinetics700, demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2024-09-02T20:00:49Z) - Arbitrary-Scale Video Super-Resolution with Structural and Textural Priors [80.92195378575671]
We describe a strong baseline for Arbitra-scale super-resolution (AVSR)
We then introduce ST-AVSR by equipping our baseline with a multi-scale structural and textural prior computed from the pre-trained VGG network.
Comprehensive experiments show that ST-AVSR significantly improves super-resolution quality, generalization ability, and inference speed over the state-of-theart.
arXiv Detail & Related papers (2024-07-13T15:27:39Z) - Learning Spatial Adaptation and Temporal Coherence in Diffusion Models for Video Super-Resolution [151.1255837803585]
We propose a novel approach, pursuing Spatial Adaptation and Temporal Coherence (SATeCo) for video super-resolution.
SATeCo pivots on learning spatial-temporal guidance from low-resolution videos to calibrate both latent-space high-resolution video denoising and pixel-space video reconstruction.
Experiments conducted on the REDS4 and Vid4 datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2024-03-25T17:59:26Z) - Video Frame Interpolation with Stereo Event and Intensity Camera [40.07341828127157]
We propose a novel Stereo Event-based VFI network (SE-VFI-Net) to generate high-quality intermediate frames.
We exploit the fused features accomplishing accurate optical flow and disparity estimation.
Our proposed SEVFI-Net outperforms state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2023-07-17T04:02:00Z) - DynamicStereo: Consistent Dynamic Depth from Stereo Videos [91.1804971397608]
We propose DynamicStereo to estimate disparity for stereo videos.
The network learns to pool information from neighboring frames to improve the temporal consistency of its predictions.
We also introduce Dynamic Replica, a new benchmark dataset containing synthetic videos of people and animals in scanned environments.
arXiv Detail & Related papers (2023-05-03T17:40:49Z) - Cross-View Hierarchy Network for Stereo Image Super-Resolution [14.574538513341277]
Stereo image super-resolution aims to improve the quality of high-resolution stereo image pairs by exploiting complementary information across views.
We propose a novel method, named Cross-View-Hierarchy Network for Stereo Image Super-Resolution (CVHSSR)
CVHSSR achieves the best stereo image super-resolution performance than other state-of-the-art methods while using fewer parameters.
arXiv Detail & Related papers (2023-04-13T03:11:30Z) - H2-Stereo: High-Speed, High-Resolution Stereoscopic Video System [39.95458608416292]
High-resolution stereoscopic (H2-Stereo) video allows us to perceive dynamic 3D content fine.
Existing methods provide compromised solutions that lack temporal or spatial details.
We propose a dual camera system, in which one captures high-spatial-resolution low-frame-rate (HSR-LFR) videos with rich spatial details.
We then devise a Learned Information Fusion network (LIFnet) that exploits the cross-camera redundancies to reconstruct the H2-Stereo video effectively.
arXiv Detail & Related papers (2022-08-04T04:06:01Z) - Towards Interpretable Video Super-Resolution via Alternating
Optimization [115.85296325037565]
We study a practical space-time video super-resolution (STVSR) problem which aims at generating a high-framerate high-resolution sharp video from a low-framerate blurry video.
We propose an interpretable STVSR framework by leveraging both model-based and learning-based methods.
arXiv Detail & Related papers (2022-07-21T21:34:05Z) - Multi-View Stereo with Transformer [31.83069394719813]
This paper proposes a network, referred to as MVSTR, for Multi-View Stereo (MVS)
It is built upon Transformer and is capable of extracting dense features with global context and 3D consistency.
Experimental results show that the proposed MVSTR achieves the best overall performance on the DTU dataset and strong generalization on the Tanks & Temples benchmark dataset.
arXiv Detail & Related papers (2021-12-01T08:06:59Z) - Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video
Super-Resolution [95.26202278535543]
A simple solution is to split it into two sub-tasks: video frame (VFI) and video super-resolution (VSR)
temporalsynthesis and spatial super-resolution are intra-related in this task.
We propose a one-stage space-time video super-resolution framework, which directly synthesizes an HR slow-motion video from an LFR, LR video.
arXiv Detail & Related papers (2020-02-26T16:59:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.