Related papers: Efficient Meta-Tuning for Content-aware Neural Video Delivery

Efficient Meta-Tuning for Content-aware Neural Video Delivery

URL: http://arxiv.org/abs/2207.09691v1
Date: Wed, 20 Jul 2022 06:47:10 GMT
Title: Efficient Meta-Tuning for Content-aware Neural Video Delivery
Authors: Xiaoqi Li, Jiaming Liu, Shizun Wang, Cheng Lyu, Ming Lu, Yurong Chen, Anbang Yao, Yandong Guo, Shanghang Zhang
Abstract summary: We present Efficient Meta-Tuning (EMT) to reduce the computational cost. EMT adapts a meta-learned model to the first chunk of the input video. We propose a novel sampling strategy to extract the most challenging patches from video frames.
Score: 40.3731358963689
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recently, Deep Neural Networks (DNNs) are utilized to reduce the bandwidth and improve the quality of Internet video delivery. Existing methods train corresponding content-aware super-resolution (SR) model for each video chunk on the server, and stream low-resolution (LR) video chunks along with SR models to the client. Although they achieve promising results, the huge computational cost of network training limits their practical applications. In this paper, we present a method named Efficient Meta-Tuning (EMT) to reduce the computational cost. Instead of training from scratch, EMT adapts a meta-learned model to the first chunk of the input video. As for the following chunks, it fine-tunes the partial parameters selected by gradient masking of previous adapted model. In order to achieve further speedup for EMT, we propose a novel sampling strategy to extract the most challenging patches from video frames. The proposed strategy is highly efficient and brings negligible additional cost. Our method significantly reduces the computational cost and achieves even better performance, paving the way for applying neural video delivery techniques to practical applications. We conduct extensive experiments based on various efficient SR architectures, including ESPCN, SRCNN, FSRCNN and EDSR-1, demonstrating the generalization ability of our work. The code is released at \url{https://github.com/Neural-video-delivery/EMT-Pytorch-ECCV2022}.

Related papers

EPS: Efficient Patch Sampling for Video Overfitting in Deep Super-Resolution Model Training [15.684865589513597]
We propose an efficient patch sampling method named EPS for video SR network overfitting. Our method reduces the number of patches for the training to 4% to 25%, depending on the resolution and number of clusters. Compared to the state-of-the-art patch sampling method, EMT, our approach achieves an 83% decrease in overall run time.
arXiv Detail & Related papers (2024-11-25T12:01:57Z)
Free Video-LLM: Prompt-guided Visual Perception for Efficient Training-free Video LLMs [56.040198387038025]
We present a novel prompt-guided visual perception framework (abbreviated as Free Video-LLM) for efficient inference of training-free video LLMs. Our method effectively reduces the number of visual tokens while maintaining high performance across multiple video question-answering benchmarks.
arXiv Detail & Related papers (2024-10-14T12:35:12Z)
Data Overfitting for On-Device Super-Resolution with Dynamic Algorithm and Compiler Co-Design [18.57172631588624]
We propose a Dynamic Deep neural network assisted by a Content-Aware data processing pipeline to reduce the model number down to one. Our method achieves better PSNR and real-time performance (33 FPS) on an off-the-shelf mobile phone.
arXiv Detail & Related papers (2024-07-03T05:17:26Z)
A Coding Framework and Benchmark towards Low-Bitrate Video Understanding [63.05385140193666]
We propose a traditional-neural mixed coding framework that takes advantage of both traditional codecs and neural networks (NNs) The framework is optimized by ensuring that a transportation-efficient semantic representation of the video is preserved. We build a low-bitrate video understanding benchmark with three downstream tasks on eight datasets, demonstrating the notable superiority of our approach.
arXiv Detail & Related papers (2022-02-06T16:29:15Z)
Capturing Temporal Information in a Single Frame: Channel Sampling Strategies for Action Recognition [19.220288614585147]
We address the problem of capturing temporal information for video classification in 2D networks, without increasing computational cost. We propose a novel sampling strategy, where we re-order the channels of the input video, to capture short-term frame-to-frame changes. Our sampling strategies do not require training from scratch and do not increase the computational cost of training and testing.
arXiv Detail & Related papers (2022-01-25T15:24:37Z)
Overfitting the Data: Compact Neural Video Delivery via Content-aware Feature Modulation [38.889823516049056]
Methods divide a video into chunks, and stream LR video chunks and corresponding content-aware models to the client. With our method, each video chunk only requires less than $1% $ of original parameters to be streamed, achieving even better SR performance.
arXiv Detail & Related papers (2021-08-18T15:34:11Z)
Low-Fidelity End-to-End Video Encoder Pre-training for Temporal Action Localization [96.73647162960842]
TAL is a fundamental yet challenging task in video understanding. Existing TAL methods rely on pre-training a video encoder through action classification supervision. We introduce a novel low-fidelity end-to-end (LoFi) video encoder pre-training method.
arXiv Detail & Related papers (2021-03-28T22:18:14Z)
Improving Computational Efficiency in Visual Reinforcement Learning via Stored Embeddings [89.63764845984076]
We present Stored Embeddings for Efficient Reinforcement Learning (SEER) SEER is a simple modification of existing off-policy deep reinforcement learning methods. We show that SEER does not degrade the performance of RLizable agents while significantly saving computation and memory.
arXiv Detail & Related papers (2021-03-04T08:14:10Z)
Performance Aware Convolutional Neural Network Channel Pruning for Embedded GPUs [6.035819238203187]
We show that a reduction in the number of convolutional channels, pruning 12% of the initial size, is in some cases detrimental to performance. We also find examples where performance-aware pruning achieves the intended results, with performance speedups of 3x with cuDNN and above 10x with Arm Compute Library and TVM.
arXiv Detail & Related papers (2020-02-20T12:07:44Z)
Model Fusion via Optimal Transport [64.13185244219353]
We present a layer-wise model fusion algorithm for neural networks. We show that this can successfully yield "one-shot" knowledge transfer between neural networks trained on heterogeneous non-i.i.d. data.
arXiv Detail & Related papers (2019-10-12T22:07:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.