Self-Supervised Learning of Perceptually Optimized Block Motion
Estimates for Video Compression
- URL: http://arxiv.org/abs/2110.01805v2
- Date: Wed, 6 Oct 2021 02:19:24 GMT
- Title: Self-Supervised Learning of Perceptually Optimized Block Motion
Estimates for Video Compression
- Authors: Somdyuti Paul, Andrey Norkin, Alan C. Bovik
- Abstract summary: We propose a search-free block motion estimation framework using a multi-stage convolutional neural network.
We deploy the multi-scale structural similarity (MS-SSIM) loss function to optimize the perceptual quality of the motion compensated predicted frames.
- Score: 50.48504867843605
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Block based motion estimation is integral to inter prediction processes
performed in hybrid video codecs. Prevalent block matching based methods that
are used to compute block motion vectors (MVs) rely on computationally
intensive search procedures. They also suffer from the aperture problem, which
can worsen as the block size is reduced. Moreover, the block matching criteria
used in typical codecs do not account for the resulting levels of perceptual
quality of the motion compensated pictures that are created upon decoding.
Towards achieving the elusive goal of perceptually optimized motion estimation,
we propose a search-free block motion estimation framework using a multi-stage
convolutional neural network, which is able to conduct motion estimation on
multiple block sizes simultaneously, using a triplet of frames as input. This
composite block translation network (CBT-Net) is trained in a self-supervised
manner on a large database that we created from publicly available uncompressed
video content. We deploy the multi-scale structural similarity (MS-SSIM) loss
function to optimize the perceptual quality of the motion compensated predicted
frames. Our experimental results highlight the computational efficiency of our
proposed model relative to conventional block matching based motion estimation
algorithms, for comparable prediction errors. Further, when used to perform
inter prediction in AV1, the MV predictions of the perceptually optimized model
result in average Bjontegaard-delta rate (BD-rate) improvements of -1.70% and
-1.52% with respect to the MS-SSIM and Video Multi-Method Assessment Fusion
(VMAF) quality metrics, respectively as compared to the block matching based
motion estimation system employed in the SVT-AV1 encoder.
Related papers
- Uniformly Accelerated Motion Model for Inter Prediction [38.34487653360328]
In natural videos, there are usually multiple moving objects with variable velocity, resulting in complex motion fields that are difficult to represent compactly.
In Versatile Video Coding (VVC), existing inter prediction methods assume uniform speed motion between consecutive frames.
We introduce a uniformly accelerated motion model (UAMM) to exploit motion-related elements (velocity, acceleration) of moving objects between the video frames.
arXiv Detail & Related papers (2024-07-16T09:46:29Z) - Object Segmentation-Assisted Inter Prediction for Versatile Video Coding [53.91821712591901]
We propose an object segmentation-assisted inter prediction method (SAIP), where objects in the reference frames are segmented by some advanced technologies.
With a proper indication, the object segmentation mask is translated from the reference frame to the current frame as the arbitrary-shaped partition of different regions.
We show that the proposed method achieves up to 1.98%, 1.14%, 0.79%, and on average 0.82%, 0.49%, 0.37% BD-rate reduction for common test sequences.
arXiv Detail & Related papers (2024-03-18T11:48:20Z) - Multiscale Motion-Aware and Spatial-Temporal-Channel Contextual Coding
Network for Learned Video Compression [24.228981098990726]
We propose a motion-aware and spatial-temporal-channel contextual coding based video compression network (MASTC-VC)
Our proposed MASTC-VC is surprior to previous state-of-the-art (SOTA) methods on three public benchmark datasets.
Our method brings average 10.15% BD-rate savings against H.265/HEVC (HM-16.20) in PSNR metric and average 23.93% BD-rate savings against H.266/VVC (VTM-13.2) in MS-SSIM metric.
arXiv Detail & Related papers (2023-10-19T13:32:38Z) - IBVC: Interpolation-driven B-frame Video Compression [68.18440522300536]
B-frame video compression aims to adopt bi-directional motion estimation and motion compensation (MEMC) coding for middle frame reconstruction.
Previous learned approaches often directly extend neural P-frame codecs to B-frame relying on bi-directional optical-flow estimation.
We propose a simple yet effective structure called Interpolation-B-frame Video Compression (IBVC) to address these issues.
arXiv Detail & Related papers (2023-09-25T02:45:51Z) - Spatial-Temporal Transformer based Video Compression Framework [44.723459144708286]
We propose a novel Spatial-Temporal Transformer based Video Compression (STT-VC) framework.
It contains a Relaxed Deformable Transformer (RDT) with Uformer based offsets estimation for motion estimation and compensation, a Multi-Granularity Prediction (MGP) module based on multi-reference frames for prediction refinement, and a Spatial Feature Distribution prior based Transformer (SFD-T) for efficient temporal-spatial joint residual compression.
Experimental results demonstrate that our method achieves the best result with 13.5% BD-Rate saving over VTM.
arXiv Detail & Related papers (2023-09-21T09:23:13Z) - Coarse-to-fine Deep Video Coding with Hyperprior-guided Mode Prediction [50.361427832256524]
We propose a coarse-to-fine (C2F) deep video compression framework for better motion compensation.
Our C2F framework can achieve better motion compensation results without significantly increasing bit costs.
arXiv Detail & Related papers (2022-06-15T11:38:53Z) - Triple Motion Estimation and Frame Interpolation based on Adaptive
Threshold for Frame Rate Up-Conversion [6.015556590955814]
In this paper, we propose a novel motion-compensated frame rate up-conversion (MC-FRUC) algorithm.
The proposed algorithm creates interpolated frames by first estimating motion vectors using unilateral (jointing forward and backward) and bilateral motion estimation.
Since motion-compensated frame along unilateral motion trajectories yields holes, a new algorithm is introduced to resolve this problem.
arXiv Detail & Related papers (2022-03-05T04:39:42Z) - MotionHint: Self-Supervised Monocular Visual Odometry with Motion
Constraints [70.76761166614511]
We present a novel self-supervised algorithm named MotionHint for monocular visual odometry (VO)
Our MotionHint algorithm can be easily applied to existing open-sourced state-of-the-art SSM-VO systems.
arXiv Detail & Related papers (2021-09-14T15:35:08Z) - End-to-end Neural Video Coding Using a Compound Spatiotemporal
Representation [33.54844063875569]
We propose a hybrid motion compensation (HMC) method that adaptively combines the predictions generated by two approaches.
Specifically, we generate a compoundtemporal representation (STR) through a recurrent information aggregation (RIA) module.
We further design a one-to-many decoder pipeline to generate multiple predictions from the CSTR, including vector-based resampling, adaptive kernel-based resampling, compensation mode selection maps and texture enhancements.
arXiv Detail & Related papers (2021-08-05T19:43:32Z) - Video Face Super-Resolution with Motion-Adaptive Feedback Cell [90.73821618795512]
Video super-resolution (VSR) methods have recently achieved a remarkable success due to the development of deep convolutional neural networks (CNN)
In this paper, we propose a Motion-Adaptive Feedback Cell (MAFC), a simple but effective block, which can efficiently capture the motion compensation and feed it back to the network in an adaptive way.
arXiv Detail & Related papers (2020-02-15T13:14:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.