Multiscale Motion-Aware and Spatial-Temporal-Channel Contextual Coding
Network for Learned Video Compression
- URL: http://arxiv.org/abs/2310.12733v1
- Date: Thu, 19 Oct 2023 13:32:38 GMT
- Title: Multiscale Motion-Aware and Spatial-Temporal-Channel Contextual Coding
Network for Learned Video Compression
- Authors: Yiming Wang, Qian Huang, Bin Tang, Huashan Sun, and Xing Li
- Abstract summary: We propose a motion-aware and spatial-temporal-channel contextual coding based video compression network (MASTC-VC)
Our proposed MASTC-VC is surprior to previous state-of-the-art (SOTA) methods on three public benchmark datasets.
Our method brings average 10.15% BD-rate savings against H.265/HEVC (HM-16.20) in PSNR metric and average 23.93% BD-rate savings against H.266/VVC (VTM-13.2) in MS-SSIM metric.
- Score: 24.228981098990726
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, learned video compression has achieved exciting performance.
Following the traditional hybrid prediction coding framework, most learned
methods generally adopt the motion estimation motion compensation (MEMC) method
to remove inter-frame redundancy. However, inaccurate motion vector (MV)
usually lead to the distortion of reconstructed frame. In addition, most
approaches ignore the spatial and channel redundancy. To solve above problems,
we propose a motion-aware and spatial-temporal-channel contextual coding based
video compression network (MASTC-VC), which learns the latent representation
and uses variational autoencoders (VAEs) to capture the characteristics of
intra-frame pixels and inter-frame motion. Specifically, we design a multiscale
motion-aware module (MS-MAM) to estimate spatial-temporal-channel consistent
motion vector by utilizing the multiscale motion prediction information in a
coarse-to-fine way. On the top of it, we further propose a
spatial-temporal-channel contextual module (STCCM), which explores the
correlation of latent representation to reduce the bit consumption from
spatial, temporal and channel aspects respectively. Comprehensive experiments
show that our proposed MASTC-VC is surprior to previous state-of-the-art (SOTA)
methods on three public benchmark datasets. More specifically, our method
brings average 10.15\% BD-rate savings against H.265/HEVC (HM-16.20) in PSNR
metric and average 23.93\% BD-rate savings against H.266/VVC (VTM-13.2) in
MS-SSIM metric.
Related papers
- U-Motion: Learned Point Cloud Video Compression with U-Structured Motion Estimation [9.528405963599997]
Point cloud video (PCV) is a versatile 3D representation of dynamic scenes with many emerging applications.
This paper introduces U-Motion, a learning-based compression scheme for both PCV geometry and attributes.
arXiv Detail & Related papers (2024-11-21T07:17:01Z) - Uniformly Accelerated Motion Model for Inter Prediction [38.34487653360328]
In natural videos, there are usually multiple moving objects with variable velocity, resulting in complex motion fields that are difficult to represent compactly.
In Versatile Video Coding (VVC), existing inter prediction methods assume uniform speed motion between consecutive frames.
We introduce a uniformly accelerated motion model (UAMM) to exploit motion-related elements (velocity, acceleration) of moving objects between the video frames.
arXiv Detail & Related papers (2024-07-16T09:46:29Z) - Object Segmentation-Assisted Inter Prediction for Versatile Video Coding [53.91821712591901]
We propose an object segmentation-assisted inter prediction method (SAIP), where objects in the reference frames are segmented by some advanced technologies.
With a proper indication, the object segmentation mask is translated from the reference frame to the current frame as the arbitrary-shaped partition of different regions.
We show that the proposed method achieves up to 1.98%, 1.14%, 0.79%, and on average 0.82%, 0.49%, 0.37% BD-rate reduction for common test sequences.
arXiv Detail & Related papers (2024-03-18T11:48:20Z) - IBVC: Interpolation-driven B-frame Video Compression [68.18440522300536]
B-frame video compression aims to adopt bi-directional motion estimation and motion compensation (MEMC) coding for middle frame reconstruction.
Previous learned approaches often directly extend neural P-frame codecs to B-frame relying on bi-directional optical-flow estimation.
We propose a simple yet effective structure called Interpolation-B-frame Video Compression (IBVC) to address these issues.
arXiv Detail & Related papers (2023-09-25T02:45:51Z) - Spatial-Temporal Transformer based Video Compression Framework [44.723459144708286]
We propose a novel Spatial-Temporal Transformer based Video Compression (STT-VC) framework.
It contains a Relaxed Deformable Transformer (RDT) with Uformer based offsets estimation for motion estimation and compensation, a Multi-Granularity Prediction (MGP) module based on multi-reference frames for prediction refinement, and a Spatial Feature Distribution prior based Transformer (SFD-T) for efficient temporal-spatial joint residual compression.
Experimental results demonstrate that our method achieves the best result with 13.5% BD-Rate saving over VTM.
arXiv Detail & Related papers (2023-09-21T09:23:13Z) - Optical-Flow-Reuse-Based Bidirectional Recurrent Network for Space-Time
Video Super-Resolution [52.899234731501075]
Space-time video super-resolution (ST-VSR) simultaneously increases the spatial resolution and frame rate for a given video.
Existing methods typically suffer from difficulties in how to efficiently leverage information from a large range of neighboring frames.
We propose a coarse-to-fine bidirectional recurrent neural network instead of using ConvLSTM to leverage knowledge between adjacent frames.
arXiv Detail & Related papers (2021-10-13T15:21:30Z) - Self-Supervised Learning of Perceptually Optimized Block Motion
Estimates for Video Compression [50.48504867843605]
We propose a search-free block motion estimation framework using a multi-stage convolutional neural network.
We deploy the multi-scale structural similarity (MS-SSIM) loss function to optimize the perceptual quality of the motion compensated predicted frames.
arXiv Detail & Related papers (2021-10-05T03:38:43Z) - End-to-end Neural Video Coding Using a Compound Spatiotemporal
Representation [33.54844063875569]
We propose a hybrid motion compensation (HMC) method that adaptively combines the predictions generated by two approaches.
Specifically, we generate a compoundtemporal representation (STR) through a recurrent information aggregation (RIA) module.
We further design a one-to-many decoder pipeline to generate multiple predictions from the CSTR, including vector-based resampling, adaptive kernel-based resampling, compensation mode selection maps and texture enhancements.
arXiv Detail & Related papers (2021-08-05T19:43:32Z) - Temporal Modulation Network for Controllable Space-Time Video
Super-Resolution [66.06549492893947]
Space-time video super-resolution aims to increase the spatial and temporal resolutions of low-resolution and low-frame-rate videos.
Deformable convolution based methods have achieved promising STVSR performance, but they could only infer the intermediate frame pre-defined in the training stage.
We propose a Temporal Modulation Network (TMNet) to interpolate arbitrary intermediate frame(s) with accurate high-resolution reconstruction.
arXiv Detail & Related papers (2021-04-21T17:10:53Z) - Learning Comprehensive Motion Representation for Action Recognition [124.65403098534266]
2D CNN-based methods are efficient but may yield redundant features due to applying the same 2D convolution kernel to each frame.
Recent efforts attempt to capture motion information by establishing inter-frame connections while still suffering the limited temporal receptive field or high latency.
We propose a Channel-wise Motion Enhancement (CME) module to adaptively emphasize the channels related to dynamic information with a channel-wise gate vector.
We also propose a Spatial-wise Motion Enhancement (SME) module to focus on the regions with the critical target in motion, according to the point-to-point similarity between adjacent feature maps.
arXiv Detail & Related papers (2021-03-23T03:06:26Z) - Neural Video Coding using Multiscale Motion Compensation and
Spatiotemporal Context Model [45.46660511313426]
We propose an end-to-end deep neural video coding framework (NVC)
It uses variational autoencoders (VAEs) with joint spatial and temporal prior aggregation (PA) to exploit the correlations in intra-frame pixels, inter-frame motions and inter-frame compensation residuals.
NVC is evaluated for the low-delay causal settings and compared with H.265/HEVC, H.264/AVC and the other learnt video compression methods.
arXiv Detail & Related papers (2020-07-09T06:15:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.