A Global Appearance and Local Coding Distortion based Fusion Framework
for CNN based Filtering in Video Coding
- URL: http://arxiv.org/abs/2106.12746v1
- Date: Thu, 24 Jun 2021 03:08:44 GMT
- Title: A Global Appearance and Local Coding Distortion based Fusion Framework
for CNN based Filtering in Video Coding
- Authors: Jian Yue, Yanbo Gao, Shuai Li, Hui Yuan, Fr\'ed\'eric Dufaux
- Abstract summary: In-loop filtering is used in video coding to process the reconstructed frame in order to remove blocking artifacts.
In this paper, we address the filtering problem from two aspects, global appearance restoration for disrupted texture and local coding distortion restoration caused by fixed pipeline of coding.
A three-stream global appearance and local coding distortion based fusion network is developed with a high-level global feature stream, a high-level local feature stream and a low-level local feature stream.
- Score: 15.778380865885842
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In-loop filtering is used in video coding to process the reconstructed frame
in order to remove blocking artifacts. With the development of convolutional
neural networks (CNNs), CNNs have been explored for in-loop filtering
considering it can be treated as an image de-noising task. However, in addition
to being a distorted image, the reconstructed frame is also obtained by a fixed
line of block based encoding operations in video coding. It carries coding-unit
based coding distortion of some similar characteristics. Therefore, in this
paper, we address the filtering problem from two aspects, global appearance
restoration for disrupted texture and local coding distortion restoration
caused by fixed pipeline of coding. Accordingly, a three-stream global
appearance and local coding distortion based fusion network is developed with a
high-level global feature stream, a high-level local feature stream and a
low-level local feature stream. Ablation study is conducted to validate the
necessity of different features, demonstrating that the global features and
local features can complement each other in filtering and achieve better
performance when combined. To the best of our knowledge, we are the first one
that clearly characterizes the video filtering process from the above global
appearance and local coding distortion restoration aspects with experimental
verification, providing a clear pathway to developing filter techniques.
Experimental results demonstrate that the proposed method significantly
outperforms the existing single-frame based methods and achieves 13.5%, 11.3%,
11.7% BD-Rate saving on average for AI, LDP and RA configurations,
respectively, compared with the HEVC reference software.
Related papers
- UTSRMorph: A Unified Transformer and Superresolution Network for Unsupervised Medical Image Registration [4.068692674719378]
Complicated image registration is a key issue in medical image analysis.
We propose a novel unsupervised image registration method named the unified Transformer and superresolution (UTSRMorph) network.
arXiv Detail & Related papers (2024-10-27T06:28:43Z) - In-Loop Filtering via Trained Look-Up Tables [45.6756570330982]
In-loop filtering (ILF) is a key technology for removing the artifacts in image/video coding standards.
We propose an efficient and practical in-loop filtering scheme by adopting the Look-up Table (LUT)
Experimental results show that the ultrafast, very fast, and fast mode of the proposed method achieves on average 0.13%/0.34%/0.51%, and 0.10%/0.27%/0.39% BD-rate reduction.
arXiv Detail & Related papers (2024-07-15T17:25:42Z) - WiTUnet: A U-Shaped Architecture Integrating CNN and Transformer for Improved Feature Alignment and Local Information Fusion [16.41082757280262]
Low-dose computed tomography (LDCT) has become the technology of choice for diagnostic medical imaging, given its lower radiation dose compared to standard CT.
In this paper, we introduce WiTUnet, a novel LDCT image denoising method that utilizes nested, dense skip pathways instead of traditional skip connections.
arXiv Detail & Related papers (2024-04-15T07:53:07Z) - Boosting Neural Representations for Videos with a Conditional Decoder [28.073607937396552]
Implicit neural representations (INRs) have emerged as a promising approach for video storage and processing.
This paper introduces a universal boosting framework for current implicit video representation approaches.
arXiv Detail & Related papers (2024-02-28T08:32:19Z) - Distance Weighted Trans Network for Image Completion [52.318730994423106]
We propose a new architecture that relies on Distance-based Weighted Transformer (DWT) to better understand the relationships between an image's components.
CNNs are used to augment the local texture information of coarse priors.
DWT blocks are used to recover certain coarse textures and coherent visual structures.
arXiv Detail & Related papers (2023-10-11T12:46:11Z) - In-Domain GAN Inversion for Faithful Reconstruction and Editability [132.68255553099834]
We propose in-domain GAN inversion, which consists of a domain-guided domain-regularized and a encoder to regularize the inverted code in the native latent space of the pre-trained GAN model.
We make comprehensive analyses on the effects of the encoder structure, the starting inversion point, as well as the inversion parameter space, and observe the trade-off between the reconstruction quality and the editing property.
arXiv Detail & Related papers (2023-09-25T08:42:06Z) - The Devil Is in the Details: Window-based Attention for Image
Compression [58.1577742463617]
Most existing learned image compression models are based on Convolutional Neural Networks (CNNs)
In this paper, we study the effects of multiple kinds of attention mechanisms for local features learning, then introduce a more straightforward yet effective window-based local attention block.
The proposed window-based attention is very flexible which could work as a plug-and-play component to enhance CNN and Transformer models.
arXiv Detail & Related papers (2022-03-16T07:55:49Z) - Distortion-Aware Loop Filtering of Intra 360^o Video Coding with
Equirectangular Projection [81.63407194858854]
We propose a distortion-aware loop filtering model to improve the performance of intra coding for 360$o$ videos projected via equirectangular projection (ERP) format.
Our proposed module analyzes content characteristics based on a coding unit (CU) partition mask and processes them through partial convolution to activate the specified area.
arXiv Detail & Related papers (2022-02-20T12:00:18Z) - Spatially-Adaptive Image Restoration using Distortion-Guided Networks [51.89245800461537]
We present a learning-based solution for restoring images suffering from spatially-varying degradations.
We propose SPAIR, a network design that harnesses distortion-localization information and dynamically adjusts to difficult regions in the image.
arXiv Detail & Related papers (2021-08-19T11:02:25Z) - Multi-Density Attention Network for Loop Filtering in Video Compression [9.322800480045336]
We propose a on-line scaling based multi-density attention network for loop filtering in video compression.
Experimental results show that 10.18% bit-rate reduction at the same video quality can be achieved over the latest Versatile Video Coding (VVC) standard.
arXiv Detail & Related papers (2021-04-08T05:46:38Z) - A U-Net Based Discriminator for Generative Adversarial Networks [86.67102929147592]
We propose an alternative U-Net based discriminator architecture for generative adversarial networks (GANs)
The proposed architecture allows to provide detailed per-pixel feedback to the generator while maintaining the global coherence of synthesized images.
The novel discriminator improves over the state of the art in terms of the standard distribution and image quality metrics.
arXiv Detail & Related papers (2020-02-28T11:16:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.