Related papers: InternVQA: Advancing Compressed Video QualityAssessment with Distilling Large Foundation Model

InternVQA: Advancing Compressed Video QualityAssessment with Distilling Large Foundation Model

URL: http://arxiv.org/abs/2502.19026v1
Date: Wed, 26 Feb 2025 10:34:14 GMT
Title: InternVQA: Advancing Compressed Video QualityAssessment with Distilling Large Foundation Model
Authors: Fengbin Guan, Zihao Yu, Yiting Lu, Xin Li, Zhibo Chen,
Abstract summary: InternVideo2 has demonstrated strong potential in video understanding tasks due to its large parameter size and large-scale multimodal data pertaining.<n>To design a lightweight model suitable for this task, we proposed a distillation method to equip the model with rich compression quality priors.<n>The results showed that, compared to other methods, our lightweight model distilled from InternVideo2 achieved excellent performance in compression video quality assessment.
Score: 15.320011514412437
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Video quality assessment tasks rely heavily on the rich features required for video understanding, such as semantic information, texture, and temporal motion. The existing video foundational model, InternVideo2, has demonstrated strong potential in video understanding tasks due to its large parameter size and large-scale multimodal data pertaining. Building on this, we explored the transferability of InternVideo2 to video quality assessment under compression scenarios. To design a lightweight model suitable for this task, we proposed a distillation method to equip the smaller model with rich compression quality priors. Additionally, we examined the performance of different backbones during the distillation process. The results showed that, compared to other methods, our lightweight model distilled from InternVideo2 achieved excellent performance in compression video quality assessment.

Related papers

Conditional Video Generation for High-Efficiency Video Compression [47.011087624381524]
We propose a video compression framework that leverages conditional diffusion models for perceptually optimized reconstruction.<n>Specifically, we reframe video compression as a conditional generation task, where a generative model synthesizes video from sparse, yet informative signals.
arXiv Detail & Related papers (2025-07-21T06:16:27Z)
Motion Consistency Model: Accelerating Video Diffusion with Disentangled Motion-Appearance Distillation [134.22372190926362]
Image diffusion distillation achieves high-fidelity generation with very few sampling steps. Applying these techniques directly to video diffusion often results in unsatisfactory frame quality due to limited visual quality in public video datasets. Our study aims to improve video diffusion distillation while improving frame appearance using abundant high-quality image data.
arXiv Detail & Related papers (2024-06-11T02:09:46Z)
Modular Blind Video Quality Assessment [33.657933680973194]
Blind video quality assessment (BVQA) plays a pivotal role in evaluating and improving the viewing experience of end-users across a wide range of video-based platforms and services. In this paper, we propose a modular BVQA model and a method of training it to improve its modularity.
arXiv Detail & Related papers (2024-02-29T15:44:00Z)
VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models [76.85329896854189]
We investigate the feasibility of leveraging low-quality videos and synthesized high-quality images to obtain a high-quality video model. We shift the distribution to higher quality without motion degradation by finetuning spatial modules with high-quality images, resulting in a generic high-quality video model.
arXiv Detail & Related papers (2024-01-17T08:30:32Z)
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets [36.95521842177614]
We present Stable Video Diffusion - a latent video diffusion model for high-resolution, state-of-the-art text-to-video and image-to-video generation. We identify and evaluate three different stages for successful training of video LDMs: text-to-image pretraining, video pretraining, and high-quality video finetuning.
arXiv Detail & Related papers (2023-11-25T22:28:38Z)
Conditional Modeling Based Automatic Video Summarization [70.96973928590958]
The aim of video summarization is to shorten videos automatically while retaining the key information necessary to convey the overall story. Video summarization methods rely on visual factors, such as visual consecutiveness and diversity, which may not be sufficient to fully understand the content of the video. A new approach to video summarization is proposed based on insights gained from how humans create ground truth video summaries.
arXiv Detail & Related papers (2023-11-20T20:24:45Z)
LAVIE: High-Quality Video Generation with Cascaded Latent Diffusion Models [133.088893990272]
We learn a high-quality text-to-video (T2V) generative model by leveraging a pre-trained text-to-image (T2I) model as a basis. We propose LaVie, an integrated video generation framework that operates on cascaded video latent diffusion models.
arXiv Detail & Related papers (2023-09-26T17:52:03Z)
VIDM: Video Implicit Diffusion Models [75.90225524502759]
Diffusion models have emerged as a powerful generative method for synthesizing high-quality and diverse set of images. We propose a video generation method based on diffusion models, where the effects of motion are modeled in an implicit condition. We improve the quality of the generated videos by proposing multiple strategies such as sampling space truncation, robustness penalty, and positional group normalization.
arXiv Detail & Related papers (2022-12-01T02:58:46Z)
Deep Quality Assessment of Compressed Videos: A Subjective and Objective Study [23.3509109592315]
In the video coding process, the perceived quality of a compressed video is evaluated by full-reference quality evaluation metrics. To solve this problem, it is critical to design no-reference compressed video quality assessment algorithms. In this work, a semi-automatic labeling method is adopted to build a large-scale compressed video quality database.
arXiv Detail & Related papers (2022-05-07T10:50:06Z)
Leveraging Bitstream Metadata for Fast, Accurate, Generalized Compressed Video Quality Enhancement [74.1052624663082]
We develop a deep learning architecture capable of restoring detail to compressed videos. We show that this improves restoration accuracy compared to prior compression correction methods. We condition our model on quantization data which is readily available in the bitstream.
arXiv Detail & Related papers (2022-01-31T18:56:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.