B-CANF: Adaptive B-frame Coding with Conditional Augmented Normalizing
Flows
- URL: http://arxiv.org/abs/2209.01769v2
- Date: Wed, 2 Aug 2023 07:48:57 GMT
- Title: B-CANF: Adaptive B-frame Coding with Conditional Augmented Normalizing
Flows
- Authors: Mu-Jung Chen, Yi-Hsin Chen, Wen-Hsiao Peng
- Abstract summary: This work introduces a novel B-frame coding framework, termed B-CANF, that exploits conditional augmented normalizing flows for B-frame coding.
B-CANF additionally features two novel elements: frame-type adaptive coding and B*-frames.
- Score: 11.574465203875342
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Over the past few years, learning-based video compression has become an
active research area. However, most works focus on P-frame coding. Learned
B-frame coding is under-explored and more challenging. This work introduces a
novel B-frame coding framework, termed B-CANF, that exploits conditional
augmented normalizing flows for B-frame coding. B-CANF additionally features
two novel elements: frame-type adaptive coding and B*-frames. Our frame-type
adaptive coding learns better bit allocation for hierarchical B-frame coding by
dynamically adapting the feature distributions according to the B-frame type.
Our B*-frames allow greater flexibility in specifying the group-of-pictures
(GOP) structure by reusing the B-frame codec to mimic P-frame coding, without
the need for an additional, separate P-frame codec. On commonly used datasets,
B-CANF achieves the state-of-the-art compression performance as compared to the
other learned B-frame codecs and shows comparable BD-rate results to HM-16.23
under the random access configuration in terms of PSNR. When evaluated on
different GOP structures, our B*-frames achieve similar performance to the
additional use of a separate P-frame codec.
Related papers
- Improved Video VAE for Latent Video Diffusion Model [55.818110540710215]
Video Autoencoder (VAE) aims to compress pixel data into low-dimensional latent space, playing an important role in OpenAI's Sora.
Most of existing VAEs inflate a pretrained image VAE into the 3D causal structure for temporal-spatial compression.
We propose a new KTC architecture and a group causal convolution (GCConv) module to further improve video VAE (IV-VAE)
arXiv Detail & Related papers (2024-11-10T12:43:38Z) - Bi-Directional Deep Contextual Video Compression [17.195099321371526]
We introduce a bi-directional deep contextual video compression scheme tailored for B-frames, termed DCVC-B.
First, we develop a bi-directional motion difference context propagation method for effective motion difference coding.
Second, we propose a bi-directional contextual compression model and a corresponding bi-directional temporal entropy model.
Third, we propose a hierarchical quality structure-based training strategy, leading to an effective bit allocation across large groups of pictures.
arXiv Detail & Related papers (2024-08-16T08:45:25Z) - UCVC: A Unified Contextual Video Compression Framework with Joint
P-frame and B-frame Coding [29.44234507064189]
This paper presents a learned video compression method in response to video compression track of the 6th Challenge on Learned Image Compression (CLIC)
We propose a unified contextual video compression framework (UCVC) for joint P-frame and B-frame coding.
arXiv Detail & Related papers (2024-02-02T10:25:39Z) - IBVC: Interpolation-driven B-frame Video Compression [68.18440522300536]
B-frame video compression aims to adopt bi-directional motion estimation and motion compensation (MEMC) coding for middle frame reconstruction.
Previous learned approaches often directly extend neural P-frame codecs to B-frame relying on bi-directional optical-flow estimation.
We propose a simple yet effective structure called Interpolation-B-frame Video Compression (IBVC) to address these issues.
arXiv Detail & Related papers (2023-09-25T02:45:51Z) - Frame Flexible Network [52.623337134518835]
Existing video recognition algorithms always conduct different training pipelines for inputs with different frame numbers.
If we evaluate the model using other frames which are not used in training, we observe the performance will drop significantly.
We propose a general framework, named Frame Flexible Network (FFN), which enables the model to be evaluated at different frames to adjust its computation.
arXiv Detail & Related papers (2023-03-26T20:51:35Z) - Advancing Learned Video Compression with In-loop Frame Prediction [177.67218448278143]
In this paper, we propose an Advanced Learned Video Compression (ALVC) approach with the in-loop frame prediction module.
The predicted frame can serve as a better reference than the previously compressed frame, and therefore it benefits the compression performance.
The experiments show the state-of-the-art performance of our ALVC approach in learned video compression.
arXiv Detail & Related papers (2022-11-13T19:53:14Z) - Inter-Frame Compression for Dynamic Point Cloud Geometry Coding [14.79613731546357]
We propose a lossy compression scheme that predicts the latent representation of the current frame using the previous frame.
The proposed network utilizes convolutions with hierarchical multiscale 3D feature learning to encode the current frame.
The proposed method achieves more than 88% BD-Rate (Bjontegaard Delta Rate) reduction against G-PCCv20 Octree.
arXiv Detail & Related papers (2022-07-25T22:17:19Z) - TTVFI: Learning Trajectory-Aware Transformer for Video Frame
Interpolation [50.49396123016185]
Video frame (VFI) aims to synthesize an intermediate frame between two consecutive frames.
We propose a novel Trajectory-aware Transformer for Video Frame Interpolation (TTVFI)
Our method outperforms other state-of-the-art methods in four widely-used VFI benchmarks.
arXiv Detail & Related papers (2022-07-19T03:37:49Z) - CANF-VC: Conditional Augmented Normalizing Flows for Video Compression [81.41594331948843]
CANF-VC is an end-to-end learning-based video compression system.
It is based on conditional augmented normalizing flows (ANF)
arXiv Detail & Related papers (2022-07-12T04:53:24Z) - Extending Neural P-frame Codecs for B-frame Coding [15.102346715690755]
Our B-frame solution is based on the existing P-frame methods.
Our results show that using the proposed method with an existing P-frame can lead to 28.5%saving in bit-rate on the UVG dataset.
arXiv Detail & Related papers (2021-03-30T21:25:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.