Related papers: Overfitting the Data: Compact Neural Video Delivery via Content-aware Feature Modulation

Overfitting the Data: Compact Neural Video Delivery via Content-aware Feature Modulation

URL: http://arxiv.org/abs/2108.08202v1
Date: Wed, 18 Aug 2021 15:34:11 GMT
Title: Overfitting the Data: Compact Neural Video Delivery via Content-aware Feature Modulation
Authors: Jiaming Liu, Ming Lu, Kaixin Chen, Xiaoqi Li, Shizun Wang, Zhaoqing Wang, Enhua Wu, Yurong Chen, Chuang Zhang, Ming Wu
Abstract summary: Methods divide a video into chunks, and stream LR video chunks and corresponding content-aware models to the client. With our method, each video chunk only requires less than $1% $ of original parameters to be streamed, achieving even better SR performance.
Score: 38.889823516049056
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Internet video delivery has undergone a tremendous explosion of growth over the past few years. However, the quality of video delivery system greatly depends on the Internet bandwidth. Deep Neural Networks (DNNs) are utilized to improve the quality of video delivery recently. These methods divide a video into chunks, and stream LR video chunks and corresponding content-aware models to the client. The client runs the inference of models to super-resolve the LR chunks. Consequently, a large number of models are streamed in order to deliver a video. In this paper, we first carefully study the relation between models of different chunks, then we tactfully design a joint training framework along with the Content-aware Feature Modulation (CaFM) layer to compress these models for neural video delivery. {\bf With our method, each video chunk only requires less than $1\% $ of original parameters to be streamed, achieving even better SR performance.} We conduct extensive experiments across various SR backbones, video time length, and scaling factors to demonstrate the advantages of our method. Besides, our method can be also viewed as a new approach of video coding. Our primary experiments achieve better video quality compared with the commercial H.264 and H.265 standard under the same storage cost, showing the great potential of the proposed method. Code is available at:\url{https://github.com/Neural-video-delivery/CaFM-Pytorch-ICCV2021}

Related papers

HyperNVD: Accelerating Neural Video Decomposition via Hypernetworks [4.536530093400348]
Existing video-layer decomposition models rely on implicit neural representations (INRs) trained independently for each video. We propose a meta-learning strategy to learn a generic video decomposition model to speed up the training on new videos. Our strategy mitigates the problem of single-video overfitting and, importantly, shortens the convergence of video decomposition on new, unseen videos.
arXiv Detail & Related papers (2025-03-21T16:24:47Z)
Data Overfitting for On-Device Super-Resolution with Dynamic Algorithm and Compiler Co-Design [18.57172631588624]
We propose a Dynamic Deep neural network assisted by a Content-Aware data processing pipeline to reduce the model number down to one. Our method achieves better PSNR and real-time performance (33 FPS) on an off-the-shelf mobile phone.
arXiv Detail & Related papers (2024-07-03T05:17:26Z)
SF-V: Single Forward Video Generation Model [57.292575082410785]
We propose a novel approach to obtain single-step video generation models by leveraging adversarial training to fine-tune pre-trained models. Experiments demonstrate that our method achieves competitive generation quality of synthesized videos with significantly reduced computational overhead.
arXiv Detail & Related papers (2024-06-06T17:58:27Z)
Efficient Video Diffusion Models via Content-Frame Motion-Latent Decomposition [124.41196697408627]
We propose content-motion latent diffusion model (CMD), a novel efficient extension of pretrained image diffusion models for video generation. CMD encodes a video as a combination of a content frame (like an image) and a low-dimensional motion latent representation. We generate the content frame by fine-tuning a pretrained image diffusion model, and we generate the motion latent representation by training a new lightweight diffusion model.
arXiv Detail & Related papers (2024-03-21T05:48:48Z)
Reinforcement Learning -based Adaptation and Scheduling Methods for Multi-source DASH [1.1971219484941955]
Dynamic adaptive streaming over HTTP (DASH) has been widely used in video streaming recently. In multi-source streaming, video chunks may arrive out of order due to different conditions of the network paths. This paper proposes two algorithms for streaming from multiple sources: RL-based adaptation with greedy scheduling (RLAGS) and RL-based adaptation and scheduling (RLAS)
arXiv Detail & Related papers (2023-07-25T06:47:12Z)
HNeRV: A Hybrid Neural Representation for Videos [56.492309149698606]
Implicit neural representations store videos as neural networks. We propose a Hybrid Neural Representation for Videos (HNeRV) With content-adaptive embeddings and re-designed architecture, HNeRV outperforms implicit methods in video regression tasks.
arXiv Detail & Related papers (2023-04-05T17:55:04Z)
Towards Scalable Neural Representation for Diverse Videos [68.73612099741956]
Implicit neural representations (INR) have gained increasing attention in representing 3D scenes and images. Existing INR-based methods are limited to encoding a handful of short videos with redundant visual content. This paper focuses on developing neural representations for encoding long and/or a large number of videos with diverse visual content.
arXiv Detail & Related papers (2023-03-24T16:32:19Z)
Towards High-Quality and Efficient Video Super-Resolution via Spatial-Temporal Data Overfitting [27.302681897961588]
Deep convolutional neural networks (DNNs) are widely used in various fields of computer vision. We propose a novel method for high-quality and efficient video resolution upscaling tasks. We deploy our models on an off-the-shelf mobile phone, and experimental results show that our method achieves real-time video super-resolution with high video quality.
arXiv Detail & Related papers (2023-03-15T02:40:02Z)
MagicVideo: Efficient Video Generation With Latent Diffusion Models [76.95903791630624]
We present an efficient text-to-video generation framework based on latent diffusion models, termed MagicVideo. Due to a novel and efficient 3D U-Net design and modeling video distributions in a low-dimensional space, MagicVideo can synthesize video clips with 256x256 spatial resolution on a single GPU card. We conduct extensive experiments and demonstrate that MagicVideo can generate high-quality video clips with either realistic or imaginary content.
arXiv Detail & Related papers (2022-11-20T16:40:31Z)
Efficient Meta-Tuning for Content-aware Neural Video Delivery [40.3731358963689]
We present Efficient Meta-Tuning (EMT) to reduce the computational cost. EMT adapts a meta-learned model to the first chunk of the input video. We propose a novel sampling strategy to extract the most challenging patches from video frames.
arXiv Detail & Related papers (2022-07-20T06:47:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.