Perceptual Coding for Compressed Video Understanding: A New Framework
and Benchmark
- URL: http://arxiv.org/abs/2202.02813v1
- Date: Sun, 6 Feb 2022 16:29:15 GMT
- Title: Perceptual Coding for Compressed Video Understanding: A New Framework
and Benchmark
- Authors: Yuan Tian, Guo Lu, Yichao Yan, Guangtao Zhai, Li Chen, Zhiyong Gao
- Abstract summary: We propose the first coding framework for compressed video understanding, where another learnable perceptual bitstream is introduced and simultaneously transported with the video bitstream.
Our framework can enjoy the best of both two worlds, (1) highly efficient content-coding of industrial video and (2) flexible perceptual-coding of neural networks (NNs)
- Score: 57.23523738351178
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most video understanding methods are learned on high-quality videos. However,
in most real-world scenarios, the videos are first compressed before the
transportation and then decompressed for understanding. The decompressed videos
are degraded in terms of perceptual quality, which may degenerate the
downstream tasks. To address this issue, we propose the first coding framework
for compressed video understanding, where another learnable perceptual
bitstream is introduced and simultaneously transported with the video
bitstream. With the sophisticatedly designed optimization target and network
architectures, this new stream largely boosts the perceptual quality of the
decoded videos yet with a small bit cost. Our framework can enjoy the best of
both two worlds, (1) highly efficient content-coding of industrial video codec
and (2) flexible perceptual-coding of neural networks (NNs). Finally, we build
a rigorous benchmark for compressed video understanding over four different
compression levels, six large-scale datasets, and two popular tasks. The
proposed Dual-bitstream Perceptual Video Coding framework Dual-PVC consistently
demonstrates significantly stronger performances than the baseline codec under
the same bitrate level.
Related papers
- BiVM: Accurate Binarized Neural Network for Efficient Video Matting [56.000594826508504]
Deep neural networks for real-time video matting suffer significant computational limitations on edge devices.<n>We present BiVM, an accurate and resource-efficient Binarized neural network for Video Matting.<n>BiVM surpasses alternative binarized video matting networks, including state-of-the-art (SOTA) binarization methods, by a substantial margin.
arXiv Detail & Related papers (2025-07-06T16:32:37Z) - Coding-Prior Guided Diffusion Network for Video Deblurring [47.77918791133459]
We present a novel framework that effectively leverages both coding priors and generative diffusion priors for high-quality deblurring.<n> Experiments demonstrate our method achieves state-of-the-art perceptual quality with up to 30% improvement in IQA metrics.
arXiv Detail & Related papers (2025-04-16T16:14:43Z) - Video-Panda: Parameter-efficient Alignment for Encoder-free Video-Language Models [26.866184981409607]
We present an efficient encoder-free approach for video-language understanding that achieves competitive performance while significantly reducing computational overhead.
Our method introduces a novel Spatio-Temporal Alignment Block (STAB) that directly processes video inputs without requiring pre-trained encoders.
Our model achieves comparable or superior performance to encoder-based approaches for open-ended video question answering on standard benchmarks.
arXiv Detail & Related papers (2024-12-24T18:59:56Z) - Motion Free B-frame Coding for Neural Video Compression [0.0]
In this paper, we propose a novel approach that handles the drawbacks of the two typical above-mentioned architectures.
The advantages of the motion-free approach are twofold: it improves the coding efficiency of the network and significantly reduces computational complexity.
Experimental results show the proposed framework outperforms the SOTA deep neural video compression networks on the HEVC-class B dataset.
arXiv Detail & Related papers (2024-11-26T07:03:11Z) - High-Efficiency Neural Video Compression via Hierarchical Predictive Learning [27.41398149573729]
Enhanced Deep Hierarchical Video Compression-DHVC 2.0- introduces superior compression performance and impressive complexity efficiency.
Uses hierarchical predictive coding to transform each video frame into multiscale representations.
Supports transmission-friendly progressive decoding, making it particularly advantageous for networked video applications in the presence of packet loss.
arXiv Detail & Related papers (2024-10-03T15:40:58Z) - When Video Coding Meets Multimodal Large Language Models: A Unified Paradigm for Video Coding [112.44822009714461]
Cross-Modality Video Coding (CMVC) is a pioneering approach to explore multimodality representation and video generative models in video coding.
During decoding, previously encoded components and video generation models are leveraged to create multiple encoding-decoding modes.
Experiments indicate that TT2V achieves effective semantic reconstruction, while IT2V exhibits competitive perceptual consistency.
arXiv Detail & Related papers (2024-08-15T11:36:18Z) - Learned Scalable Video Coding For Humans and Machines [4.14360329494344]
We introduce an end-to-end learnable video task in its base layer, while its enhancement layer, together with the base layer, supports input reconstruction for human viewing.
Our framework outperforms both state-of-the-art learned and conventional video codecs in its base layer, while maintaining comparable performance on the human vision task in its enhancement layer.
arXiv Detail & Related papers (2023-07-18T05:22:25Z) - VNVC: A Versatile Neural Video Coding Framework for Efficient
Human-Machine Vision [59.632286735304156]
It is more efficient to enhance/analyze the coded representations directly without decoding them into pixels.
We propose a versatile neural video coding (VNVC) framework, which targets learning compact representations to support both reconstruction and direct enhancement/analysis.
arXiv Detail & Related papers (2023-06-19T03:04:57Z) - Contrastive Masked Autoencoders for Self-Supervised Video Hashing [54.636976693527636]
Self-Supervised Video Hashing (SSVH) models learn to generate short binary representations for videos without ground-truth supervision.
We propose a simple yet effective one-stage SSVH method called ConMH, which incorporates video semantic information and video similarity relationship understanding.
arXiv Detail & Related papers (2022-11-21T06:48:14Z) - Scalable Neural Video Representations with Learnable Positional Features [73.51591757726493]
We show how to train neural representations with learnable positional features (NVP) that effectively amortize a video as latent codes.
We demonstrate the superiority of NVP on the popular UVG benchmark; compared with prior arts, NVP not only trains 2 times faster (less than 5 minutes) but also exceeds their encoding quality as 34.07rightarrow$34.57 (measured with the PSNR metric)
arXiv Detail & Related papers (2022-10-13T08:15:08Z) - Content Adaptive and Error Propagation Aware Deep Video Compression [110.31693187153084]
We propose a content adaptive and error propagation aware video compression system.
Our method employs a joint training strategy by considering the compression performance of multiple consecutive frames instead of a single frame.
Instead of using the hand-crafted coding modes in the traditional compression systems, we design an online encoder updating scheme in our system.
arXiv Detail & Related papers (2020-03-25T09:04:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.