Related papers: Learning Dynamic Point Cloud Compression via Hierarchical Inter-frame Block Matching

Learning Dynamic Point Cloud Compression via Hierarchical Inter-frame Block Matching

URL: http://arxiv.org/abs/2305.05356v2
Date: Tue, 16 May 2023 05:25:00 GMT
Title: Learning Dynamic Point Cloud Compression via Hierarchical Inter-frame Block Matching
Authors: Shuting Xia, Tingyu Fan, Yiling Xu, Jenq-Neng Hwang, Zhu Li
Abstract summary: 3D dynamic point cloud (DPC) compression relies on mining its temporal context. This paper proposes a learning-based DPC compression framework via hierarchical block-matching-based inter-prediction module.
Score: 35.80653765524654
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: 3D dynamic point cloud (DPC) compression relies on mining its temporal context, which faces significant challenges due to DPC's sparsity and non-uniform structure. Existing methods are limited in capturing sufficient temporal dependencies. Therefore, this paper proposes a learning-based DPC compression framework via hierarchical block-matching-based inter-prediction module to compensate and compress the DPC geometry in latent space. Specifically, we propose a hierarchical motion estimation and motion compensation (Hie-ME/MC) framework for flexible inter-prediction, which dynamically selects the granularity of optical flow to encapsulate the motion information accurately. To improve the motion estimation efficiency of the proposed inter-prediction module, we further design a KNN-attention block matching (KABM) network that determines the impact of potential corresponding points based on the geometry and feature correlation. Finally, we compress the residual and the multi-scale optical flow with a fully-factorized deep entropy model. The experiment result on the MPEG-specified Owlii Dynamic Human Dynamic Point Cloud (Owlii) dataset shows that our framework outperforms the previous state-of-the-art methods and the MPEG standard V-PCC v18 in inter-frame low-delay mode.

Related papers

Multi-Modality Driven LoRA for Adverse Condition Depth Estimation [61.525312117638116]
We propose Multi-Modality Driven LoRA (MMD-LoRA) for Adverse Condition Depth Estimation. It consists of two core components: Prompt Driven Domain Alignment (PDDA) and Visual-Text Consistent Contrastive Learning (VTCCL) It achieves state-of-the-art performance on the nuScenes and Oxford RobotCar datasets.
arXiv Detail & Related papers (2024-12-28T14:23:58Z)
U-Motion: Learned Point Cloud Video Compression with U-Structured Motion Estimation [9.528405963599997]
Point cloud video (PCV) is a versatile 3D representation of dynamic scenes with many emerging applications. This paper introduces U-Motion, a learning-based compression scheme for both PCV geometry and attributes.
arXiv Detail & Related papers (2024-11-21T07:17:01Z)
DA-Flow: Dual Attention Normalizing Flow for Skeleton-based Video Anomaly Detection [52.74152717667157]
We propose a lightweight module called Dual Attention Module (DAM) for capturing cross-dimension interaction relationships in-temporal skeletal data. It employs the frame attention mechanism to identify the most significant frames and the skeleton attention mechanism to capture broader relationships across fixed partitions with minimal parameters and flops.
arXiv Detail & Related papers (2024-06-05T06:18:03Z)
On Exploring PDE Modeling for Point Cloud Video Representation Learning [48.02197741709501]
We introduce Motion PointNet composed of a PointNet-like encoder and a PDE-solving module. Our Motion PointNet achieves an impressive accuracy of 97.52% on the MSRAction-3D dataset.
arXiv Detail & Related papers (2024-04-06T19:50:48Z)
Motion-Aware Video Frame Interpolation [49.49668436390514]
We introduce a Motion-Aware Video Frame Interpolation (MA-VFI) network, which directly estimates intermediate optical flow from consecutive frames. It not only extracts global semantic relationships and spatial details from input frames with different receptive fields, but also effectively reduces the required computational cost and complexity.
arXiv Detail & Related papers (2024-02-05T11:00:14Z)
Spatial-Temporal Transformer based Video Compression Framework [44.723459144708286]
We propose a novel Spatial-Temporal Transformer based Video Compression (STT-VC) framework. It contains a Relaxed Deformable Transformer (RDT) with Uformer based offsets estimation for motion estimation and compensation, a Multi-Granularity Prediction (MGP) module based on multi-reference frames for prediction refinement, and a Spatial Feature Distribution prior based Transformer (SFD-T) for efficient temporal-spatial joint residual compression. Experimental results demonstrate that our method achieves the best result with 13.5% BD-Rate saving over VTM.
arXiv Detail & Related papers (2023-09-21T09:23:13Z)
Dynamic Kernel-Based Adaptive Spatial Aggregation for Learned Image Compression [63.56922682378755]
We focus on extending spatial aggregation capability and propose a dynamic kernel-based transform coding. The proposed adaptive aggregation generates kernel offsets to capture valid information in the content-conditioned range to help transform. Experimental results demonstrate that our method achieves superior rate-distortion performance on three benchmarks compared to the state-of-the-art learning-based methods.
arXiv Detail & Related papers (2023-08-17T01:34:51Z)
Learned Video Compression via Heterogeneous Deformable Compensation Network [78.72508633457392]
We propose a learned video compression framework via heterogeneous deformable compensation strategy (HDCVC) to tackle the problems of unstable compression performance. More specifically, the proposed algorithm extracts features from the two adjacent frames to estimate content-Neighborhood heterogeneous deformable (HetDeform) kernel offsets. Experimental results indicate that HDCVC achieves superior performance than the recent state-of-the-art learned video compression approaches.
arXiv Detail & Related papers (2022-07-11T02:31:31Z)
D-DPCC: Deep Dynamic Point Cloud Compression via 3D Motion Prediction [18.897023700334458]
This paper proposes a novel 3D sparse convolution-based Deep Dynamic Point Cloud Compression network. It compensates and compress the DPC geometry with 3D motion estimation and motion compensation in the feature space. The experimental result shows that the proposed D-DPCC framework achieves an average 76% BD-Rate (Bjontegaard Delta Rate) gains against state-of-the-art Video-based Point Cloud Compression (V-PCC) v13 in inter mode.
arXiv Detail & Related papers (2022-05-02T18:10:45Z)
4DAC: Learning Attribute Compression for Dynamic Point Clouds [37.447460254690135]
We study the attribute (e.g., color) compression of dynamic point clouds and present a learning-based framework, termed 4DAC. To reduce temporal redundancy within data, we first build the 3D motion estimation and motion compensation modules with deep neural networks. In addition, we also propose a deep conditional entropy model to estimate the probability distribution of the transformed coefficients.
arXiv Detail & Related papers (2022-04-25T15:30:06Z)
Spatiotemporal Entropy Model is All You Need for Learned Video Compression [9.227865598115024]
We propose a framework to compress raw-pixel frames (rather than residual images) An entropy model is used to estimate thetemporal redundancy in a latent space rather than pixel level. Experiments showed that the proposed method outperforms state-of-the-art (SOTA) performance.
arXiv Detail & Related papers (2021-04-13T10:38:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.