Related papers: Efficient Bitrate Ladder Construction using Transfer Learning and Spatio-Temporal Features

Efficient Bitrate Ladder Construction using Transfer Learning and Spatio-Temporal Features

URL: http://arxiv.org/abs/2401.03195v2
Date: Thu, 14 Mar 2024 03:59:19 GMT
Title: Efficient Bitrate Ladder Construction using Transfer Learning and Spatio-Temporal Features
Authors: Ali Falahati, Mohammad Karim Safavi, Ardavan Elahi, Farhad Pakdaman, Moncef Gabbouj,
Abstract summary: We propose an efficient ladder prediction method using transfer and learning features. The method tested on 102 video scenes demonstrates 94.1% reduction in complexity versus brute-force at 1.71% BD-Rate expense.
Score: 12.631821085716853
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Providing high-quality video with efficient bitrate is a main challenge in video industry. The traditional one-size-fits-all scheme for bitrate ladders is inefficient and reaching the best content-aware decision computationally impractical due to extensive encodings required. To mitigate this, we propose a bitrate and complexity efficient bitrate ladder prediction method using transfer learning and spatio-temporal features. We propose: (1) using feature maps from well-known pre-trained DNNs to predict rate-quality behavior with limited training data; and (2) improving highest quality rung efficiency by predicting minimum bitrate for top quality and using it for the top rung. The method tested on 102 video scenes demonstrates 94.1% reduction in complexity versus brute-force at 1.71% BD-Rate expense. Additionally, transfer learning was thoroughly studied through four networks and ablation studies.

Related papers

Towards Practical Real-Time Neural Video Compression [60.390180067626396]
We introduce a practical real-time neural video (NVC) designed to deliver high compression ratio, low latency and broad versatility. Experiments show our proposed DCVC-RT achieves an impressive average encoding/desampling speed 125.2/112.8 (frames per second) for 1080p video, while saving an average of 21% in fps compared to H.266/VTM.
arXiv Detail & Related papers (2025-02-28T06:32:23Z)
RL-RC-DoT: A Block-level RL agent for Task-Aware Video Compression [68.31184784672227]
In modern applications such as autonomous driving, an overwhelming majority of videos serve as input for AI systems performing tasks. It is therefore useful to optimize the encoder for a downstream task instead of for image quality. Here, we address this challenge by controlling the Quantization Parameters (QPs) at the macro-block level to optimize the downstream task.
arXiv Detail & Related papers (2025-01-21T15:36:08Z)
SparseTem: Boosting the Efficiency of CNN-Based Video Encoders by Exploiting Temporal Continuity [15.872209884833977]
We propose a memory-efficient scheduling method to eliminate memory overhead and an online adjustment mechanism to minimize accuracy degradation. SparseTem achieves speedup of 1.79x for EfficientDet and 4.72x for CRNN, with minimal accuracy drop and no additional memory overhead.
arXiv Detail & Related papers (2024-10-28T07:13:25Z)
Optimal Transcoding Resolution Prediction for Efficient Per-Title Bitrate Ladder Estimation [9.332104035349932]
We demonstrate that content-optimized features and ladders can be efficiently determined without any pre-encoding. Our method well approximates the ground-truth-resolution pairs with a slight Bjontegaard Delta rate loss of 1.21%.
arXiv Detail & Related papers (2024-01-09T08:01:47Z)
Towards High-Quality and Efficient Video Super-Resolution via Spatial-Temporal Data Overfitting [27.302681897961588]
Deep convolutional neural networks (DNNs) are widely used in various fields of computer vision. We propose a novel method for high-quality and efficient video resolution upscaling tasks. We deploy our models on an off-the-shelf mobile phone, and experimental results show that our method achieves real-time video super-resolution with high video quality.
arXiv Detail & Related papers (2023-03-15T02:40:02Z)
Deep Unsupervised Key Frame Extraction for Efficient Video Classification [63.25852915237032]
This work presents an unsupervised method to retrieve the key frames, which combines Convolutional Neural Network (CNN) and Temporal Segment Density Peaks Clustering (TSDPC) The proposed TSDPC is a generic and powerful framework and it has two advantages compared with previous works, one is that it can calculate the number of key frames automatically. Furthermore, a Long Short-Term Memory network (LSTM) is added on the top of the CNN to further elevate the performance of classification.
arXiv Detail & Related papers (2022-11-12T20:45:35Z)
AlphaVC: High-Performance and Efficient Learned Video Compression [4.807439168741098]
We introduce conditional-I-frame as the first frame in the GoP, which stabilizes the reconstructed quality and saves the bit-rate. Second, to efficiently improve the accuracy of inter prediction without increasing the complexity of decoder, we propose a pixel-to-feature motion prediction method at encoder side. Third, we propose a probability-based entropy skipping method, which not only brings performance gain, but also greatly reduces the runtime of entropy coding.
arXiv Detail & Related papers (2022-07-29T13:52:44Z)
NSNet: Non-saliency Suppression Sampler for Efficient Video Recognition [89.84188594758588]
A novel Non-saliency Suppression Network (NSNet) is proposed to suppress the responses of non-salient frames. NSNet achieves the state-of-the-art accuracy-efficiency trade-off and presents a significantly faster (2.44.3x) practical inference speed than state-of-the-art methods.
arXiv Detail & Related papers (2022-07-21T09:41:22Z)
Convex Hull Prediction for Adaptive Video Streaming by Recurrent Learning [38.574550778712236]
We propose a deep learning based method of content aware convex hull prediction. We employ a recurrent convolutional network (RCN) to implicitly analyze the complexity of video shots in order to predict their convex hulls. Our proposed model better approximations of the optimal convex hulls, and offers competitive time savings as compared to existing approaches.
arXiv Detail & Related papers (2022-06-10T05:11:02Z)
ETAD: A Unified Framework for Efficient Temporal Action Detection [70.21104995731085]
Untrimmed video understanding such as temporal action detection (TAD) often suffers from the pain of huge demand for computing resources. We build a unified framework for efficient end-to-end temporal action detection (ETAD) ETAD achieves state-of-the-art performance on both THUMOS-14 and ActivityNet-1.3.
arXiv Detail & Related papers (2022-05-14T21:16:21Z)
Efficient Few-Shot Object Detection via Knowledge Inheritance [62.36414544915032]
Few-shot object detection (FSOD) aims at learning a generic detector that can adapt to unseen tasks with scarce training samples. We present an efficient pretrain-transfer framework (PTF) baseline with no computational increment. We also propose an adaptive length re-scaling (ALR) strategy to alleviate the vector length inconsistency between the predicted novel weights and the pretrained base weights.
arXiv Detail & Related papers (2022-03-23T06:24:31Z)
Investigating Tradeoffs in Real-World Video Super-Resolution [90.81396836308085]
Real-world video super-resolution (VSR) models are often trained with diverse degradations to improve generalizability. To alleviate the first tradeoff, we propose a degradation scheme that reduces up to 40% of training time without sacrificing performance. To facilitate fair comparisons, we propose the new VideoLQ dataset, which contains a large variety of real-world low-quality video sequences.
arXiv Detail & Related papers (2021-11-24T18:58:21Z)
Low-Fidelity End-to-End Video Encoder Pre-training for Temporal Action Localization [96.73647162960842]
TAL is a fundamental yet challenging task in video understanding. Existing TAL methods rely on pre-training a video encoder through action classification supervision. We introduce a novel low-fidelity end-to-end (LoFi) video encoder pre-training method.
arXiv Detail & Related papers (2021-03-28T22:18:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.