Joint Reference Frame Synthesis and Post Filter Enhancement for Versatile Video Coding
- URL: http://arxiv.org/abs/2404.18058v1
- Date: Sun, 28 Apr 2024 03:11:44 GMT
- Title: Joint Reference Frame Synthesis and Post Filter Enhancement for Versatile Video Coding
- Authors: Weijie Bao, Yuantong Zhang, Jianghao Jia, Zhenzhong Chen, Shan Liu,
- Abstract summary: This paper presents the joint reference frame synthesis (RFS) and post-processing filter enhancement (PFE) for Versatile Video Coding (VVC)
Both RFS and PFE utilize the Space-Time Enhancement Network (STENet), which receives two input frames with artifacts and produces two enhanced frames with suppressed artifacts, along with an intermediate synthesized frame.
To reduce inference complexity, we propose joint inference of RFS and PFE (JISE), achieved through a single execution of STENet.
- Score: 53.703894799335735
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents the joint reference frame synthesis (RFS) and post-processing filter enhancement (PFE) for Versatile Video Coding (VVC), aiming to explore the combination of different neural network-based video coding (NNVC) tools to better utilize the hierarchical bi-directional coding structure of VVC. Both RFS and PFE utilize the Space-Time Enhancement Network (STENet), which receives two input frames with artifacts and produces two enhanced frames with suppressed artifacts, along with an intermediate synthesized frame. STENet comprises two pipelines, the synthesis pipeline and the enhancement pipeline, tailored for different purposes. During RFS, two reconstructed frames are sent into STENet's synthesis pipeline to synthesize a virtual reference frame, similar to the current to-be-coded frame. The synthesized frame serves as an additional reference frame inserted into the reference picture list (RPL). During PFE, two reconstructed frames are fed into STENet's enhancement pipeline to alleviate their artifacts and distortions, resulting in enhanced frames with reduced artifacts and distortions. To reduce inference complexity, we propose joint inference of RFS and PFE (JISE), achieved through a single execution of STENet. Integrated into the VVC reference software VTM-15.0, RFS, PFE, and JISE are coordinated within a novel Space-Time Enhancement Window (STEW) under Random Access (RA) configuration. The proposed method could achieve -7.34%/-17.21%/-16.65% PSNR-based BD-rate on average for three components under RA configuration.
Related papers
- Efficient View Synthesis and 3D-based Multi-Frame Denoising with
Multiplane Feature Representations [1.18885605647513]
We introduce the first 3D-based multi-frame denoising method that significantly outperforms its 2D-based counterparts with lower computational requirements.
Our method extends the multiplane image (MPI) framework for novel view synthesis by introducing a learnable encoder-renderer pair manipulating multiplane in feature space.
arXiv Detail & Related papers (2023-03-31T15:23:35Z) - A Unified Pyramid Recurrent Network for Video Frame Interpolation [10.859715542859773]
We present UPR-Net, a novel Unified Pyramid Recurrent Network for frame synthesis.
We show that our iterative synthesis strategy can significantly improve the robustness of the frame on large motion cases.
arXiv Detail & Related papers (2022-11-07T11:12:31Z) - Temporal Consistency Learning of inter-frames for Video Super-Resolution [38.26035126565062]
Video super-resolution (VSR) is a task that aims to reconstruct high-resolution (HR) frames from the low-resolution (LR) reference frame and multiple neighboring frames.
Existing methods generally explore information propagation and frame alignment to improve the performance of VSR.
We propose a Temporal Consistency learning Network (TCNet) for VSR in an end-to-end manner, to enhance the consistency of the reconstructed videos.
arXiv Detail & Related papers (2022-11-03T08:23:57Z) - TTVFI: Learning Trajectory-Aware Transformer for Video Frame
Interpolation [50.49396123016185]
Video frame (VFI) aims to synthesize an intermediate frame between two consecutive frames.
We propose a novel Trajectory-aware Transformer for Video Frame Interpolation (TTVFI)
Our method outperforms other state-of-the-art methods in four widely-used VFI benchmarks.
arXiv Detail & Related papers (2022-07-19T03:37:49Z) - STDAN: Deformable Attention Network for Space-Time Video
Super-Resolution [39.18399652834573]
We propose a deformable attention network called STDAN for STVSR.
First, we devise a long-short term feature (LSTFI) module, which is capable of abundant content from more neighboring input frames.
Second, we put forward a spatial-temporal deformable feature aggregation (STDFA) module, in which spatial and temporal contexts are adaptively captured and aggregated.
arXiv Detail & Related papers (2022-03-14T03:40:35Z) - VRT: A Video Restoration Transformer [126.79589717404863]
Video restoration (e.g., video super-resolution) aims to restore high-quality frames from low-quality frames.
We propose a Video Restoration Transformer (VRT) with parallel frame prediction and long-range temporal dependency modelling abilities.
arXiv Detail & Related papers (2022-01-28T17:54:43Z) - Condensing a Sequence to One Informative Frame for Video Recognition [113.3056598548736]
This paper studies a two-step alternative that first condenses the video sequence to an informative "frame"
A valid question is how to define "useful information" and then distill from a sequence down to one synthetic frame.
IFS consistently demonstrates evident improvements on image-based 2D networks and clip-based 3D networks.
arXiv Detail & Related papers (2022-01-11T16:13:43Z) - Flow-Guided Sparse Transformer for Video Deblurring [124.11022871999423]
FlowGuided Sparse Transformer (F GST) is a framework for video deblurring.
FGSW-MSA enjoys the guidance of the estimated optical flow to globally sample spatially sparse elements corresponding to the same scene patch in neighboring frames.
Our proposed F GST outperforms state-of-the-art patches on both DVD and GOPRO datasets and even yields more visually pleasing results in real video deblurring.
arXiv Detail & Related papers (2022-01-06T02:05:32Z) - PP-MSVSR: Multi-Stage Video Super-Resolution [4.039183755023383]
Key for Video Super-Resolution(VSR) task is to make full use of complementary information across frames to reconstruct the high-resolution sequence.
We propose a multi-stage VSR deep architecture, dubbed as PP-MSVSR, with local fusion module, auxiliary loss and re-align module.
Experiments substantiate that PP-MSVSR achieves a PSNR of 28.13dB with only 1.45M parameters.
arXiv Detail & Related papers (2021-12-06T07:28:52Z) - Optical-Flow-Reuse-Based Bidirectional Recurrent Network for Space-Time
Video Super-Resolution [52.899234731501075]
Space-time video super-resolution (ST-VSR) simultaneously increases the spatial resolution and frame rate for a given video.
Existing methods typically suffer from difficulties in how to efficiently leverage information from a large range of neighboring frames.
We propose a coarse-to-fine bidirectional recurrent neural network instead of using ConvLSTM to leverage knowledge between adjacent frames.
arXiv Detail & Related papers (2021-10-13T15:21:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.