Related papers: Efficient Per-Shot Convex Hull Prediction By Recurrent Learning

Efficient Per-Shot Convex Hull Prediction By Recurrent Learning

URL: http://arxiv.org/abs/2206.04877v1
Date: Fri, 10 Jun 2022 05:11:02 GMT
Title: Efficient Per-Shot Convex Hull Prediction By Recurrent Learning
Authors: Somdyuti Paul, Andrey Norkin and Alan C. Bovik
Abstract summary: We propose a deep learning based method of content aware convex hull prediction. We employ a recurrent convolutional network (RCN) to implicitly analyze the complexity of video shots in order to predict their convex hulls. Our experimental results reveal that our proposed model better approximations of the optimal convex hulls, and offers competitive time savings as compared to existing approaches.
Score: 50.94452824380868
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Adaptive video streaming relies on the construction of efficient bitrate ladders to deliver the best possible visual quality to viewers under bandwidth constraints. The traditional method of content dependent bitrate ladder selection requires a video shot to be pre-encoded with multiple encoding parameters to find the optimal operating points given by the convex hull of the resulting rate-quality curves. However, this pre-encoding step is equivalent to an exhaustive search process over the space of possible encoding parameters, which causes significant overhead in terms of both computation and time expenditure. To reduce this overhead, we propose a deep learning based method of content aware convex hull prediction. We employ a recurrent convolutional network (RCN) to implicitly analyze the spatiotemporal complexity of video shots in order to predict their convex hulls. A two-step transfer learning scheme is adopted to train our proposed RCN-Hull model, which ensures sufficient content diversity to analyze scene complexity, while also making it possible capture the scene statistics of pristine source videos. Our experimental results reveal that our proposed model yields better approximations of the optimal convex hulls, and offers competitive time savings as compared to existing approaches. On average, the pre-encoding time was reduced by 58.0% by our method, while the average Bjontegaard delta bitrate (BD-rate) of the predicted convex hulls against ground truth was 0.08%, while the mean absolute deviation of the BD-rate distribution was 0.44%

Related papers

Image Coding for Machines via Feature-Preserving Rate-Distortion Optimization [27.97760974010369]
We show an approach to reduce the effect of compression on a task loss using the distance between features as a distortion metric. We simplify the RDO formulation to make the distortion term computable using block-based encoders. We show up to 10% bit-rate savings for the same computer vision accuracy compared to RDO based on SSE.
arXiv Detail & Related papers (2025-04-03T02:11:26Z)
Buffer Anytime: Zero-Shot Video Depth and Normal from Image Priors [54.8852848659663]
Buffer Anytime is a framework for estimation of depth and normal maps (which we call geometric buffers) from video. We demonstrate high-quality video buffer estimation by leveraging single-image priors with temporal consistency constraints.
arXiv Detail & Related papers (2024-11-26T09:28:32Z)
Optimal Transcoding Resolution Prediction for Efficient Per-Title Bitrate Ladder Estimation [9.332104035349932]
We demonstrate that content-optimized features and ladders can be efficiently determined without any pre-encoding. Our method well approximates the ground-truth-resolution pairs with a slight Bjontegaard Delta rate loss of 1.21%.
arXiv Detail & Related papers (2024-01-09T08:01:47Z)
Corner-to-Center Long-range Context Model for Efficient Learned Image Compression [70.0411436929495]
In the framework of learned image compression, the context model plays a pivotal role in capturing the dependencies among latent representations. We propose the textbfCorner-to-Center transformer-based Context Model (C$3$M) designed to enhance context and latent predictions. In addition, to enlarge the receptive field in the analysis and synthesis transformation, we use the Long-range Crossing Attention Module (LCAM) in the encoder/decoder.
arXiv Detail & Related papers (2023-11-29T21:40:28Z)
Differentiable bit-rate estimation for neural-based video codec enhancement [2.592974861902384]
Neural networks (NN) can improve standard video compression by pre- and post-processing the encoded video. For optimal NN training, the standard proxy needs to be replaced with a proxy that can provide derivatives of estimated bit-rate and distortion. This paper presents a new approach for bit-rate estimation that is similar to the type employed in training end-to-end neural codecs.
arXiv Detail & Related papers (2023-01-24T01:36:07Z)
Learning Quantization in LDPC Decoders [14.37550972719183]
We propose a floating-point surrogate model that imitates quantization effects as additions of uniform noise. A deep learning-based method is then applied to optimize the message bitwidths. We report an error-rate performance within 0.2 dB of floating-point decoding at an average message quantization bitwidth of 3.1 bits.
arXiv Detail & Related papers (2022-08-10T07:07:54Z)
Coarse-to-fine Deep Video Coding with Hyperprior-guided Mode Prediction [50.361427832256524]
We propose a coarse-to-fine (C2F) deep video compression framework for better motion compensation. Our C2F framework can achieve better motion compensation results without significantly increasing bit costs.
arXiv Detail & Related papers (2022-06-15T11:38:53Z)
Efficient VVC Intra Prediction Based on Deep Feature Fusion and Probability Estimation [57.66773945887832]
We propose to optimize Versatile Video Coding (VVC) complexity at intra-frame prediction, with a two-stage framework of deep feature fusion and probability estimation. Experimental results on standard database demonstrate the superiority of proposed method, especially for High Definition (HD) and Ultra-HD (UHD) video sequences.
arXiv Detail & Related papers (2022-05-07T08:01:32Z)
Learning Cross-Scale Prediction for Efficient Neural Video Compression [30.051859347293856]
We present the first neural video that can compete with the latest coding standard H.266/VVC in terms of sRGB PSNR on UVG dataset for the low-latency mode. We propose a novel cross-scale prediction module that achieves more effective motion compensation.
arXiv Detail & Related papers (2021-12-26T03:12:17Z)
AuxAdapt: Stable and Efficient Test-Time Adaptation for Temporally Consistent Video Semantic Segmentation [81.87943324048756]
In video segmentation, generating temporally consistent results across frames is as important as achieving frame-wise accuracy. Existing methods rely on optical flow regularization or fine-tuning with test data to attain temporal consistency. This paper presents an efficient, intuitive, and unsupervised online adaptation method, AuxAdapt, for improving the temporal consistency of most neural network models.
arXiv Detail & Related papers (2021-10-24T07:07:41Z)
End-to-end Neural Video Coding Using a Compound Spatiotemporal Representation [33.54844063875569]
We propose a hybrid motion compensation (HMC) method that adaptively combines the predictions generated by two approaches. Specifically, we generate a compoundtemporal representation (STR) through a recurrent information aggregation (RIA) module. We further design a one-to-many decoder pipeline to generate multiple predictions from the CSTR, including vector-based resampling, adaptive kernel-based resampling, compensation mode selection maps and texture enhancements.
arXiv Detail & Related papers (2021-08-05T19:43:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.