Palantir: Towards Efficient Super Resolution for Ultra-high-definition Live Streaming
- URL: http://arxiv.org/abs/2408.06152v2
- Date: Sat, 31 Aug 2024 12:32:50 GMT
- Title: Palantir: Towards Efficient Super Resolution for Ultra-high-definition Live Streaming
- Authors: Xinqi Jin, Zhui Zhu, Xikai Sun, Fan Dang, Jiangchuan Liu, Jingao Xu, Kebin Liu, Xinlei Chen, Yunhao Liu,
- Abstract summary: Palantir is the first neural-enhanced UHD live streaming system with fine-grained patch-level scheduling.
Palantir incurs a negligible scheduling latency accounting for less than 5.7% of the end-to-end latency.
- Score: 29.567573296006515
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural enhancement through super-resolution (SR) deep neural networks (DNNs) opens up new possibilities for ultra-high-definition (UHD) live streaming over existing encoding and networking infrastructure. Yet, the heavy SR DNN inference overhead leads to severe deployment challenges. To reduce the overhead, existing systems propose to apply DNN-based SR only on carefully selected anchor frames while upscaling non-anchor frames via the lightweight reusing-based SR approach. However, frame-level scheduling is coarse-grained and fails to deliver optimal efficiency. In this work, we propose Palantir, the first neural-enhanced UHD live streaming system with fine-grained patch-level scheduling. Two novel techniques are incorporated into Palantir to select the most beneficial anchor patches and support latency-sensitive UHD live streaming applications. Firstly, under the guidance of our pioneering and theoretical analysis, Palantir constructs a directed acyclic graph (DAG) for lightweight yet accurate SR quality estimation under any possible anchor patch set. Secondly, to further optimize the scheduling latency, Palantir improves parallelizability by refactoring the computation subprocedure of the estimation process into a sparse matrix-matrix multiplication operation. The evaluation results suggest that Palantir incurs a negligible scheduling latency accounting for less than 5.7% of the end-to-end latency requirement. When compared to the naive method of applying DNN-based SR on all the frames, Palantir can reduce the SR DNN inference overhead by 20 times (or 60 times) while preserving 54.0-82.6% (or 32.8-64.0%) of the quality gain. When compared to the state-of-the-art real-time frame-level scheduling strategy, Palantir can reduce the SR DNN inference overhead by 80.1% at most (and 38.4% on average) without sacrificing the video quality.
Related papers
- Efficient Event-based Delay Learning in Spiking Neural Networks [0.1350479308585481]
Spiking Neural Networks (SNNs) compute using sparse communication and are attracting increased attention.<n>We propose a novel event-based training method for SNNs with delays, grounded in the EventProp formalism.<n>Our method supports multiple spikes per neuron and, to the best of our knowledge, is the first delay learning algorithm to be applied to recurrent SNNs.
arXiv Detail & Related papers (2025-01-13T13:44:34Z) - Direct Training Needs Regularisation: Anytime Optimal Inference Spiking Neural Network [23.434563009813218]
Spiking Neural Network (SNN) is acknowledged as the next generation of Artificial Neural Network (ANN)
We introduce a novel regularisation technique, namely Spatial-Temporal Regulariser (STR)
STR regulates the ratio between the strength of spikes and membrane potential at each timestep.
This effectively balances spatial and temporal performance during training, ultimately resulting in an Anytime Optimal Inference (AOI) SNN.
arXiv Detail & Related papers (2024-04-15T15:57:01Z) - Spiker+: a framework for the generation of efficient Spiking Neural
Networks FPGA accelerators for inference at the edge [49.42371633618761]
Spiker+ is a framework for generating efficient, low-power, and low-area customized Spiking Neural Networks (SNN) accelerators on FPGA for inference at the edge.
Spiker+ is tested on two benchmark datasets, the MNIST and the Spiking Heidelberg Digits (SHD)
arXiv Detail & Related papers (2024-01-02T10:42:42Z) - Instant Complexity Reduction in CNNs using Locality-Sensitive Hashing [50.79602839359522]
We propose HASTE (Hashing for Tractable Efficiency), a parameter-free and data-free module that acts as a plug-and-play replacement for any regular convolution module.
We are able to drastically compress latent feature maps without sacrificing much accuracy by using locality-sensitive hashing (LSH)
In particular, we are able to instantly drop 46.72% of FLOPs while only losing 1.25% accuracy by just swapping the convolution modules in a ResNet34 on CIFAR-10 for our HASTE module.
arXiv Detail & Related papers (2023-09-29T13:09:40Z) - AccDecoder: Accelerated Decoding for Neural-enhanced Video Analytics [26.012783785622073]
Low-quality video is collected by existing surveillance systems because of poor quality cameras or over-compressed/pruned video streaming protocols.
We present AccDecoder, a novel accelerated decoder for real-time and neural network-based video analytics.
arXiv Detail & Related papers (2023-01-20T16:30:44Z) - Training High-Performance Low-Latency Spiking Neural Networks by
Differentiation on Spike Representation [70.75043144299168]
Spiking Neural Network (SNN) is a promising energy-efficient AI model when implemented on neuromorphic hardware.
It is a challenge to efficiently train SNNs due to their non-differentiability.
We propose the Differentiation on Spike Representation (DSR) method, which could achieve high performance.
arXiv Detail & Related papers (2022-05-01T12:44:49Z) - An Adaptive Device-Edge Co-Inference Framework Based on Soft
Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices.
We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations.
Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z) - One Timestep is All You Need: Training Spiking Neural Networks with
Ultra Low Latency [8.590196535871343]
Spiking Neural Networks (SNNs) are energy efficient alternatives to commonly used deep neural networks (DNNs)
High inference latency is a significant hindrance to the edge deployment of deep SNNs.
We propose an Iterative Initialization and Retraining method for SNNs (IIR-SNN) to perform single shot inference in the temporal axis.
arXiv Detail & Related papers (2021-10-01T22:54:59Z) - Spatio-Temporal Pruning and Quantization for Low-latency Spiking Neural
Networks [6.011954485684313]
Spiking Neural Networks (SNNs) are a promising alternative to traditional deep learning methods.
However, a major drawback of SNNs is high inference latency.
In this paper, we propose spatial and temporal pruning of SNNs.
arXiv Detail & Related papers (2021-04-26T12:50:58Z) - Learning N:M Fine-grained Structured Sparse Neural Networks From Scratch [75.69506249886622]
Sparsity in Deep Neural Networks (DNNs) has been widely studied to compress and accelerate the models on resource-constrained environments.
In this paper, we are the first to study training from scratch an N:M fine-grained structured sparse network.
arXiv Detail & Related papers (2021-02-08T05:55:47Z) - FastEmit: Low-latency Streaming ASR with Sequence-level Emission
Regularization [78.46088089185156]
Streaming automatic speech recognition (ASR) aims to emit each hypothesized word as quickly and accurately as possible.
Existing approaches penalize emission delay by manipulating per-token or per-frame probability prediction in sequence transducer models.
We propose a sequence-level emission regularization method, named FastEmit, that applies latency regularization directly on per-sequence probability in training transducer models.
arXiv Detail & Related papers (2020-10-21T17:05:01Z) - DIET-SNN: Direct Input Encoding With Leakage and Threshold Optimization
in Deep Spiking Neural Networks [8.746046482977434]
DIET-SNN is a low-deep spiking network that is trained with gradient descent to optimize the membrane leak and the firing threshold.
We evaluate DIET-SNN on image classification tasks from CIFAR and ImageNet datasets on VGG and ResNet architectures.
We achieve top-1 accuracy of 69% with 5 timesteps (inference latency) on the ImageNet dataset with 12x less compute energy than an equivalent standard ANN.
arXiv Detail & Related papers (2020-08-09T05:07:17Z) - PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with
Pattern-based Weight Pruning [57.20262984116752]
We introduce a new dimension, fine-grained pruning patterns inside the coarse-grained structures, revealing a previously unknown point in design space.
With the higher accuracy enabled by fine-grained pruning patterns, the unique insight is to use the compiler to re-gain and guarantee high hardware efficiency.
arXiv Detail & Related papers (2020-01-01T04:52:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.