Reference-based Restoration of Digitized Analog Videotapes
- URL: http://arxiv.org/abs/2310.14926v2
- Date: Fri, 3 Nov 2023 09:20:02 GMT
- Title: Reference-based Restoration of Digitized Analog Videotapes
- Authors: Lorenzo Agnolucci, Leonardo Galteri, Marco Bertini, Alberto Del Bimbo
- Abstract summary: We present a reference-based approach for the resToration of digitized Analog videotaPEs (TAPE)
We leverage CLIP for zero-shot artifact detection to identify the cleanest frames of each video through textual prompts describing different artifacts.
To address the absence of ground truth in real-world videos, we create a synthetic dataset of videos exhibiting artifacts that closely resemble those commonly found in analog videotapes.
- Score: 28.773037051085318
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Analog magnetic tapes have been the main video data storage device for
several decades. Videos stored on analog videotapes exhibit unique degradation
patterns caused by tape aging and reader device malfunctioning that are
different from those observed in film and digital video restoration tasks. In
this work, we present a reference-based approach for the resToration of
digitized Analog videotaPEs (TAPE). We leverage CLIP for zero-shot artifact
detection to identify the cleanest frames of each video through textual prompts
describing different artifacts. Then, we select the clean frames most similar
to the input ones and employ them as references. We design a transformer-based
Swin-UNet network that exploits both neighboring and reference frames via our
Multi-Reference Spatial Feature Fusion (MRSFF) blocks. MRSFF blocks rely on
cross-attention and attention pooling to take advantage of the most useful
parts of each reference frame. To address the absence of ground truth in
real-world videos, we create a synthetic dataset of videos exhibiting artifacts
that closely resemble those commonly found in analog videotapes. Both
quantitative and qualitative experiments show the effectiveness of our approach
compared to other state-of-the-art methods. The code, the model, and the
synthetic dataset are publicly available at https://github.com/miccunifi/TAPE.
Related papers
- Data Collection-free Masked Video Modeling [6.641717260925999]
We introduce an effective self-supervised learning framework for videos that leverages and less costly static images.
These pseudo-motion videos are then leveraged in masked video modeling.
Our approach is applicable to synthetic images as well, thus entirely freeing video-training from data collection costs other concerns in real data.
arXiv Detail & Related papers (2024-09-10T17:34:07Z) - A Low-Computational Video Synopsis Framework with a Standard Dataset [0.0]
Video synopsis is an efficient method for condensing surveillance videos.
The lack of a standard dataset for the video synopsis task hinders the comparison of different video synopsis models.
This paper introduces a video synopsis model, called FGS, with low computational cost.
arXiv Detail & Related papers (2024-09-08T22:08:36Z) - Restoration of Analog Videos Using Swin-UNet [28.773037051085318]
We present a system to restore analog videos of historical archives.
The proposed system uses a multi-frame approach and is able to deal with severe tape mistracking.
arXiv Detail & Related papers (2023-11-07T16:00:31Z) - Reuse and Diffuse: Iterative Denoising for Text-to-Video Generation [92.55296042611886]
We propose a framework called "Reuse and Diffuse" dubbed $textitVidRD$ to produce more frames following the frames already generated by an LDM.
We also propose a set of strategies for composing video-text data that involve diverse content from multiple existing datasets.
arXiv Detail & Related papers (2023-09-07T08:12:58Z) - Video Event Restoration Based on Keyframes for Video Anomaly Detection [9.18057851239942]
Existing deep neural network based anomaly detection (VAD) methods mostly follow the route of frame reconstruction or frame prediction.
We introduce a brand-new VAD paradigm to break through these limitations.
We propose a novel U-shaped Swin Transformer Network with Dual Skip Connections (USTN-DSC) for video event restoration.
arXiv Detail & Related papers (2023-04-11T10:13:19Z) - Weakly-Supervised Action Detection Guided by Audio Narration [50.4318060593995]
We propose a model to learn from the narration supervision and utilize multimodal features, including RGB, motion flow, and ambient sound.
Our experiments show that noisy audio narration suffices to learn a good action detection model, thus reducing annotation expenses.
arXiv Detail & Related papers (2022-05-12T06:33:24Z) - Video Demoireing with Relation-Based Temporal Consistency [68.20281109859998]
Moire patterns, appearing as color distortions, severely degrade image and video qualities when filming a screen with digital cameras.
We study how to remove such undesirable moire patterns in videos, namely video demoireing.
arXiv Detail & Related papers (2022-04-06T17:45:38Z) - Flow-Guided Sparse Transformer for Video Deblurring [124.11022871999423]
FlowGuided Sparse Transformer (F GST) is a framework for video deblurring.
FGSW-MSA enjoys the guidance of the estimated optical flow to globally sample spatially sparse elements corresponding to the same scene patch in neighboring frames.
Our proposed F GST outperforms state-of-the-art patches on both DVD and GOPRO datasets and even yields more visually pleasing results in real video deblurring.
arXiv Detail & Related papers (2022-01-06T02:05:32Z) - HODOR: High-level Object Descriptors for Object Re-segmentation in Video
Learned from Static Images [123.65233334380251]
We propose HODOR: a novel method that effectively leveraging annotated static images for understanding object appearance and scene context.
As a result, HODOR achieves state-of-the-art performance on the DAVIS and YouTube-VOS benchmarks.
Without any architectural modification, HODOR can also learn from video context around single annotated video frames.
arXiv Detail & Related papers (2021-12-16T18:59:53Z) - Self-supervised Video Representation Learning Using Inter-intra
Contrastive Framework [43.002621928500425]
We propose a self-supervised method to learn feature representations from videos.
Because video representation is important, we extend negative samples by introducing intra-negative samples.
We conduct experiments on video retrieval and video recognition tasks using the learned video representation.
arXiv Detail & Related papers (2020-08-06T09:08:14Z) - BBAND Index: A No-Reference Banding Artifact Predictor [55.42929350861115]
Banding artifact, or false contouring, is a common video compression impairment.
We propose a new distortion-specific no-reference video quality model for predicting banding artifacts, called the Blind BANding Detector (BBAND index)
arXiv Detail & Related papers (2020-02-27T03:05:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.