Implicit-explicit Integrated Representations for Multi-view Video
Compression
- URL: http://arxiv.org/abs/2311.17350v1
- Date: Wed, 29 Nov 2023 04:15:57 GMT
- Title: Implicit-explicit Integrated Representations for Multi-view Video
Compression
- Authors: Chen Zhu, Guo Lu, Bing He, Rong Xie, Li Song
- Abstract summary: We propose an implicit-explicit integrated representation for multi-view video compression.
The proposed framework combines the strengths of both implicit neural representation and explicit 2D datasets.
Our proposed framework can achieve comparable or even superior performance to the latest multi-view video compression standard MIV.
- Score: 40.86402535896703
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the increasing consumption of 3D displays and virtual reality,
multi-view video has become a promising format. However, its high resolution
and multi-camera shooting result in a substantial increase in data volume,
making storage and transmission a challenging task. To tackle these
difficulties, we propose an implicit-explicit integrated representation for
multi-view video compression. Specifically, we first use the explicit
representation-based 2D video codec to encode one of the source views.
Subsequently, we propose employing the implicit neural representation
(INR)-based codec to encode the remaining views. The implicit codec takes the
time and view index of multi-view video as coordinate inputs and generates the
corresponding implicit reconstruction frames.To enhance the compressibility, we
introduce a multi-level feature grid embedding and a fully convolutional
architecture into the implicit codec. These components facilitate
coordinate-feature and feature-RGB mapping, respectively. To further enhance
the reconstruction quality from the INR codec, we leverage the high-quality
reconstructed frames from the explicit codec to achieve inter-view
compensation. Finally, the compensated results are fused with the implicit
reconstructions from the INR to obtain the final reconstructed frames. Our
proposed framework combines the strengths of both implicit neural
representation and explicit 2D codec. Extensive experiments conducted on public
datasets demonstrate that the proposed framework can achieve comparable or even
superior performance to the latest multi-view video compression standard MIV
and other INR-based schemes in terms of view compression and scene modeling.
Related papers
- Compression-Realized Deep Structural Network for Video Quality Enhancement [78.13020206633524]
This paper focuses on the task of quality enhancement for compressed videos.
A new paradigm is urgently needed for a more "conscious" process of quality enhancement.
We propose the Compression-Realize Deep Structural Network (CRDS), introducing three inductive biases aligned with the three primary processes in the classic compression domain.
arXiv Detail & Related papers (2024-05-10T09:18:17Z) - Hybrid Video Diffusion Models with 2D Triplane and 3D Wavelet Representation [35.52770785430601]
We propose a novel hybrid video autoencoder, called HVtemporalDM, which can capture intricate dependencies more effectively.
The HVDM is trained by a hybrid video autoencoder which extracts a disentangled representation of the video.
Our hybrid autoencoder provide a more comprehensive video latent enriching the generated videos with fine structures and details.
arXiv Detail & Related papers (2024-02-21T11:46:16Z) - VCISR: Blind Single Image Super-Resolution with Video Compression
Synthetic Data [18.877077302923713]
We present a video compression-based degradation model to synthesize low-resolution image data in the blind SISR task.
Our proposed image synthesizing method is widely applicable to existing image datasets.
By introducing video coding artifacts to SISR degradation models, neural networks can super-resolve images with the ability to restore video compression degradations.
arXiv Detail & Related papers (2023-11-02T05:24:19Z) - VNVC: A Versatile Neural Video Coding Framework for Efficient
Human-Machine Vision [59.632286735304156]
It is more efficient to enhance/analyze the coded representations directly without decoding them into pixels.
We propose a versatile neural video coding (VNVC) framework, which targets learning compact representations to support both reconstruction and direct enhancement/analysis.
arXiv Detail & Related papers (2023-06-19T03:04:57Z) - FFNeRV: Flow-Guided Frame-Wise Neural Representations for Videos [5.958701846880935]
We propose FFNeRV, a novel method for incorporating flow information into frame-wise representations to exploit the temporal redundancy across the frames in videos.
With model compression techniques, FFNeRV outperforms widely-used standard video codecs (H.264 and HEVC) and performs on par with state-of-the-art video compression algorithms.
arXiv Detail & Related papers (2022-12-23T12:51:42Z) - Scalable Neural Video Representations with Learnable Positional Features [73.51591757726493]
We show how to train neural representations with learnable positional features (NVP) that effectively amortize a video as latent codes.
We demonstrate the superiority of NVP on the popular UVG benchmark; compared with prior arts, NVP not only trains 2 times faster (less than 5 minutes) but also exceeds their encoding quality as 34.07rightarrow$34.57 (measured with the PSNR metric)
arXiv Detail & Related papers (2022-10-13T08:15:08Z) - Learned Video Compression via Heterogeneous Deformable Compensation
Network [78.72508633457392]
We propose a learned video compression framework via heterogeneous deformable compensation strategy (HDCVC) to tackle the problems of unstable compression performance.
More specifically, the proposed algorithm extracts features from the two adjacent frames to estimate content-Neighborhood heterogeneous deformable (HetDeform) kernel offsets.
Experimental results indicate that HDCVC achieves superior performance than the recent state-of-the-art learned video compression approaches.
arXiv Detail & Related papers (2022-07-11T02:31:31Z) - An Emerging Coding Paradigm VCM: A Scalable Coding Approach Beyond
Feature and Signal [99.49099501559652]
Video Coding for Machine (VCM) aims to bridge the gap between visual feature compression and classical video coding.
We employ a conditional deep generation network to reconstruct video frames with the guidance of learned motion pattern.
By learning to extract sparse motion pattern via a predictive model, the network elegantly leverages the feature representation to generate the appearance of to-be-coded frames.
arXiv Detail & Related papers (2020-01-09T14:18:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.