Implicit-explicit Integrated Representations for Multi-view Video
Compression
- URL: http://arxiv.org/abs/2311.17350v1
- Date: Wed, 29 Nov 2023 04:15:57 GMT
- Title: Implicit-explicit Integrated Representations for Multi-view Video
Compression
- Authors: Chen Zhu, Guo Lu, Bing He, Rong Xie, Li Song
- Abstract summary: We propose an implicit-explicit integrated representation for multi-view video compression.
The proposed framework combines the strengths of both implicit neural representation and explicit 2D datasets.
Our proposed framework can achieve comparable or even superior performance to the latest multi-view video compression standard MIV.
- Score: 40.86402535896703
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the increasing consumption of 3D displays and virtual reality,
multi-view video has become a promising format. However, its high resolution
and multi-camera shooting result in a substantial increase in data volume,
making storage and transmission a challenging task. To tackle these
difficulties, we propose an implicit-explicit integrated representation for
multi-view video compression. Specifically, we first use the explicit
representation-based 2D video codec to encode one of the source views.
Subsequently, we propose employing the implicit neural representation
(INR)-based codec to encode the remaining views. The implicit codec takes the
time and view index of multi-view video as coordinate inputs and generates the
corresponding implicit reconstruction frames.To enhance the compressibility, we
introduce a multi-level feature grid embedding and a fully convolutional
architecture into the implicit codec. These components facilitate
coordinate-feature and feature-RGB mapping, respectively. To further enhance
the reconstruction quality from the INR codec, we leverage the high-quality
reconstructed frames from the explicit codec to achieve inter-view
compensation. Finally, the compensated results are fused with the implicit
reconstructions from the INR to obtain the final reconstructed frames. Our
proposed framework combines the strengths of both implicit neural
representation and explicit 2D codec. Extensive experiments conducted on public
datasets demonstrate that the proposed framework can achieve comparable or even
superior performance to the latest multi-view video compression standard MIV
and other INR-based schemes in terms of view compression and scene modeling.
Related papers
- High-Efficiency Neural Video Compression via Hierarchical Predictive Learning [27.41398149573729]
Enhanced Deep Hierarchical Video Compression-DHVC 2.0- introduces superior compression performance and impressive complexity efficiency.
Uses hierarchical predictive coding to transform each video frame into multiscale representations.
Supports transmission-friendly progressive decoding, making it particularly advantageous for networked video applications in the presence of packet loss.
arXiv Detail & Related papers (2024-10-03T15:40:58Z) - When Video Coding Meets Multimodal Large Language Models: A Unified Paradigm for Video Coding [112.44822009714461]
Cross-Modality Video Coding (CMVC) is a pioneering approach to explore multimodality representation and video generative models in video coding.
During decoding, previously encoded components and video generation models are leveraged to create multiple encoding-decoding modes.
Experiments indicate that TT2V achieves effective semantic reconstruction, while IT2V exhibits competitive perceptual consistency.
arXiv Detail & Related papers (2024-08-15T11:36:18Z) - Compression-Realized Deep Structural Network for Video Quality Enhancement [78.13020206633524]
This paper focuses on the task of quality enhancement for compressed videos.
Most of the existing methods lack a structured design to optimally leverage the priors within compression codecs.
A new paradigm is urgently needed for a more conscious'' process of quality enhancement.
arXiv Detail & Related papers (2024-05-10T09:18:17Z) - Hybrid Video Diffusion Models with 2D Triplane and 3D Wavelet Representation [35.52770785430601]
We propose a novel hybrid video autoencoder, called HVtemporalDM, which can capture intricate dependencies more effectively.
The HVDM is trained by a hybrid video autoencoder which extracts a disentangled representation of the video.
Our hybrid autoencoder provide a more comprehensive video latent enriching the generated videos with fine structures and details.
arXiv Detail & Related papers (2024-02-21T11:46:16Z) - VNVC: A Versatile Neural Video Coding Framework for Efficient
Human-Machine Vision [59.632286735304156]
It is more efficient to enhance/analyze the coded representations directly without decoding them into pixels.
We propose a versatile neural video coding (VNVC) framework, which targets learning compact representations to support both reconstruction and direct enhancement/analysis.
arXiv Detail & Related papers (2023-06-19T03:04:57Z) - FFNeRV: Flow-Guided Frame-Wise Neural Representations for Videos [5.958701846880935]
We propose FFNeRV, a novel method for incorporating flow information into frame-wise representations to exploit the temporal redundancy across the frames in videos.
With model compression techniques, FFNeRV outperforms widely-used standard video codecs (H.264 and HEVC) and performs on par with state-of-the-art video compression algorithms.
arXiv Detail & Related papers (2022-12-23T12:51:42Z) - Learned Video Compression via Heterogeneous Deformable Compensation
Network [78.72508633457392]
We propose a learned video compression framework via heterogeneous deformable compensation strategy (HDCVC) to tackle the problems of unstable compression performance.
More specifically, the proposed algorithm extracts features from the two adjacent frames to estimate content-Neighborhood heterogeneous deformable (HetDeform) kernel offsets.
Experimental results indicate that HDCVC achieves superior performance than the recent state-of-the-art learned video compression approaches.
arXiv Detail & Related papers (2022-07-11T02:31:31Z) - An Emerging Coding Paradigm VCM: A Scalable Coding Approach Beyond
Feature and Signal [99.49099501559652]
Video Coding for Machine (VCM) aims to bridge the gap between visual feature compression and classical video coding.
We employ a conditional deep generation network to reconstruct video frames with the guidance of learned motion pattern.
By learning to extract sparse motion pattern via a predictive model, the network elegantly leverages the feature representation to generate the appearance of to-be-coded frames.
arXiv Detail & Related papers (2020-01-09T14:18:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.