A New Dataset and Transformer for Stereoscopic Video Super-Resolution
- URL: http://arxiv.org/abs/2204.10039v1
- Date: Thu, 21 Apr 2022 11:49:29 GMT
- Title: A New Dataset and Transformer for Stereoscopic Video Super-Resolution
- Authors: Hassan Imani, Md Baharul Islam, Lai-Kuan Wong
- Abstract summary: Stereo video super-resolution aims to enhance the resolution of the low-resolution by reconstructing the high-resolution video.
Key challenges in SVSR are preserving the stereo-consistency and temporal-consistency, without which viewers may experience 3D fatigue.
In this paper, we propose a novel Transformer-based model for SVSR, namely Trans-SVSR.
- Score: 4.332879001008757
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Stereo video super-resolution (SVSR) aims to enhance the spatial resolution
of the low-resolution video by reconstructing the high-resolution video. The
key challenges in SVSR are preserving the stereo-consistency and
temporal-consistency, without which viewers may experience 3D fatigue. There
are several notable works on stereoscopic image super-resolution, but there is
little research on stereo video super-resolution. In this paper, we propose a
novel Transformer-based model for SVSR, namely Trans-SVSR. Trans-SVSR comprises
two key novel components: a spatio-temporal convolutional self-attention layer
and an optical flow-based feed-forward layer that discovers the correlation
across different video frames and aligns the features. The parallax attention
mechanism (PAM) that uses the cross-view information to consider the
significant disparities is used to fuse the stereo views. Due to the lack of a
benchmark dataset suitable for the SVSR task, we collected a new stereoscopic
video dataset, SVSR-Set, containing 71 full high-definition (HD) stereo videos
captured using a professional stereo camera. Extensive experiments on the
collected dataset, along with two other datasets, demonstrate that the
Trans-SVSR can achieve competitive performance compared to the state-of-the-art
methods. Project code and additional results are available at
https://github.com/H-deep/Trans-SVSR/
Related papers
- VideoLights: Feature Refinement and Cross-Task Alignment Transformer for Joint Video Highlight Detection and Moment Retrieval [8.908777234657046]
Large-language and vision-language models (LLM/LVLMs) have gained prominence across various domains.
Here we propose VideoLights, a novel HD/MR framework addressing these limitations through (i) Convolutional Projection and Feature Refinement modules.
Comprehensive experiments on QVHighlights, TVSum, and Charades-STA benchmarks demonstrate state-of-the-art performance.
arXiv Detail & Related papers (2024-12-02T14:45:53Z) - Video Set Distillation: Information Diversification and Temporal Densification [68.85010825225528]
Video textbfsets have two dimensions of redundancies: within-sample and inter-sample redundancies.
We are the first to study Video Set Distillation, which synthesizes optimized video data by addressing within-sample and inter-sample redundancies.
arXiv Detail & Related papers (2024-11-28T05:37:54Z) - Sync from the Sea: Retrieving Alignable Videos from Large-Scale Datasets [62.280729345770936]
We introduce the task of Alignable Video Retrieval (AVR)
Given a query video, our approach can identify well-alignable videos from a large collection of clips and temporally synchronize them to the query.
Our experiments on 3 datasets, including large-scale Kinetics700, demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2024-09-02T20:00:49Z) - Arbitrary-Scale Video Super-Resolution with Structural and Textural Priors [80.92195378575671]
We describe a strong baseline for Arbitra-scale super-resolution (AVSR)
We then introduce ST-AVSR by equipping our baseline with a multi-scale structural and textural prior computed from the pre-trained VGG network.
Comprehensive experiments show that ST-AVSR significantly improves super-resolution quality, generalization ability, and inference speed over the state-of-theart.
arXiv Detail & Related papers (2024-07-13T15:27:39Z) - Video Frame Interpolation with Stereo Event and Intensity Camera [40.07341828127157]
We propose a novel Stereo Event-based VFI network (SE-VFI-Net) to generate high-quality intermediate frames.
We exploit the fused features accomplishing accurate optical flow and disparity estimation.
Our proposed SEVFI-Net outperforms state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2023-07-17T04:02:00Z) - DynamicStereo: Consistent Dynamic Depth from Stereo Videos [91.1804971397608]
We propose DynamicStereo to estimate disparity for stereo videos.
The network learns to pool information from neighboring frames to improve the temporal consistency of its predictions.
We also introduce Dynamic Replica, a new benchmark dataset containing synthetic videos of people and animals in scanned environments.
arXiv Detail & Related papers (2023-05-03T17:40:49Z) - Cross-View Hierarchy Network for Stereo Image Super-Resolution [14.574538513341277]
Stereo image super-resolution aims to improve the quality of high-resolution stereo image pairs by exploiting complementary information across views.
We propose a novel method, named Cross-View-Hierarchy Network for Stereo Image Super-Resolution (CVHSSR)
CVHSSR achieves the best stereo image super-resolution performance than other state-of-the-art methods while using fewer parameters.
arXiv Detail & Related papers (2023-04-13T03:11:30Z) - Towards Interpretable Video Super-Resolution via Alternating
Optimization [115.85296325037565]
We study a practical space-time video super-resolution (STVSR) problem which aims at generating a high-framerate high-resolution sharp video from a low-framerate blurry video.
We propose an interpretable STVSR framework by leveraging both model-based and learning-based methods.
arXiv Detail & Related papers (2022-07-21T21:34:05Z) - Multi-View Stereo with Transformer [31.83069394719813]
This paper proposes a network, referred to as MVSTR, for Multi-View Stereo (MVS)
It is built upon Transformer and is capable of extracting dense features with global context and 3D consistency.
Experimental results show that the proposed MVSTR achieves the best overall performance on the DTU dataset and strong generalization on the Tanks & Temples benchmark dataset.
arXiv Detail & Related papers (2021-12-01T08:06:59Z) - Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video
Super-Resolution [95.26202278535543]
A simple solution is to split it into two sub-tasks: video frame (VFI) and video super-resolution (VSR)
temporalsynthesis and spatial super-resolution are intra-related in this task.
We propose a one-stage space-time video super-resolution framework, which directly synthesizes an HR slow-motion video from an LFR, LR video.
arXiv Detail & Related papers (2020-02-26T16:59:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.