Self-Conditioned Probabilistic Learning of Video Rescaling
- URL: http://arxiv.org/abs/2107.11639v1
- Date: Sat, 24 Jul 2021 15:57:15 GMT
- Title: Self-Conditioned Probabilistic Learning of Video Rescaling
- Authors: Yuan Tian, Guo Lu, Xiongkuo Min, Zhaohui Che, Guangtao Zhai, Guodong
Guo, Zhiyong Gao
- Abstract summary: We propose a self-conditioned probabilistic framework for video rescaling to learn the paired downscaling and upscaling procedures simultaneously.
We decrease the entropy of the information lost in the downscaling by maximizing its conditioned probability on the strong spatial-temporal prior information.
We extend the framework to a lossy video compression system, in which a gradient estimator for non-differential industrial lossy codecs is proposed.
- Score: 70.10092286301997
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Bicubic downscaling is a prevalent technique used to reduce the video storage
burden or to accelerate the downstream processing speed. However, the inverse
upscaling step is non-trivial, and the downscaled video may also deteriorate
the performance of downstream tasks. In this paper, we propose a
self-conditioned probabilistic framework for video rescaling to learn the
paired downscaling and upscaling procedures simultaneously. During the
training, we decrease the entropy of the information lost in the downscaling by
maximizing its probability conditioned on the strong spatial-temporal prior
information within the downscaled video. After optimization, the downscaled
video by our framework preserves more meaningful information, which is
beneficial for both the upscaling step and the downstream tasks, e.g., video
action recognition task. We further extend the framework to a lossy video
compression system, in which a gradient estimator for non-differential
industrial lossy codecs is proposed for the end-to-end training of the whole
system. Extensive experimental results demonstrate the superiority of our
approach on video rescaling, video compression, and efficient action
recognition tasks.
Related papers
- Uncertainty-Aware Deep Video Compression with Ensembles [24.245365441718654]
We propose an uncertainty-aware video compression model that can effectively capture predictive uncertainty with deep ensembles.
Our model can effectively save bits by more than 20% compared to 1080p sequences.
arXiv Detail & Related papers (2024-03-28T05:44:48Z) - Blurry Video Compression: A Trade-off between Visual Enhancement and
Data Compression [65.8148169700705]
Existing video compression (VC) methods primarily aim to reduce the spatial and temporal redundancies between consecutive frames in a video.
Previous works have achieved remarkable results on videos acquired under specific settings such as instant (known) exposure time and shutter speed.
In this work, we tackle the VC problem in a general scenario where a given video can be blurry due to predefined camera settings or dynamics in the scene.
arXiv Detail & Related papers (2023-11-08T02:17:54Z) - Differentiable Resolution Compression and Alignment for Efficient Video
Classification and Retrieval [16.497758750494537]
We propose an efficient video representation network with Differentiable Resolution Compression and Alignment mechanism.
We leverage a Differentiable Context-aware Compression Module to encode the saliency and non-saliency frame features.
We introduce a new Resolution-Align Transformer Layer to capture global temporal correlations among frame features with different resolutions.
arXiv Detail & Related papers (2023-09-15T05:31:53Z) - Speeding Up Action Recognition Using Dynamic Accumulation of Residuals
in Compressed Domain [2.062593640149623]
Temporal redundancy and the sheer size of raw videos are the two most common problematic issues related to video processing algorithms.
This paper presents an approach for using residual data, available in compressed videos directly, which can be obtained by a light partially decoding procedure.
Applying neural networks exclusively for accumulated residuals in the compressed domain accelerates performance, while the classification results are highly competitive with raw video approaches.
arXiv Detail & Related papers (2022-09-29T13:08:49Z) - Exploring Long- and Short-Range Temporal Information for Learned Video
Compression [54.91301930491466]
We focus on exploiting the unique characteristics of video content and exploring temporal information to enhance compression performance.
For long-range temporal information exploitation, we propose temporal prior that can update continuously within the group of pictures (GOP) during inference.
In that case temporal prior contains valuable temporal information of all decoded images within the current GOP.
In detail, we design a hierarchical structure to achieve multi-scale compensation.
arXiv Detail & Related papers (2022-08-07T15:57:18Z) - Low-Fidelity End-to-End Video Encoder Pre-training for Temporal Action
Localization [96.73647162960842]
TAL is a fundamental yet challenging task in video understanding.
Existing TAL methods rely on pre-training a video encoder through action classification supervision.
We introduce a novel low-fidelity end-to-end (LoFi) video encoder pre-training method.
arXiv Detail & Related papers (2021-03-28T22:18:14Z) - An Efficient Recurrent Adversarial Framework for Unsupervised Real-Time
Video Enhancement [132.60976158877608]
We propose an efficient adversarial video enhancement framework that learns directly from unpaired video examples.
In particular, our framework introduces new recurrent cells that consist of interleaved local and global modules for implicit integration of spatial and temporal information.
The proposed design allows our recurrent cells to efficiently propagate-temporal-information across frames and reduces the need for high complexity networks.
arXiv Detail & Related papers (2020-12-24T00:03:29Z) - Content Adaptive and Error Propagation Aware Deep Video Compression [110.31693187153084]
We propose a content adaptive and error propagation aware video compression system.
Our method employs a joint training strategy by considering the compression performance of multiple consecutive frames instead of a single frame.
Instead of using the hand-crafted coding modes in the traditional compression systems, we design an online encoder updating scheme in our system.
arXiv Detail & Related papers (2020-03-25T09:04:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.