KeyNode-Driven Geometry Coding for Real-World Scanned Human Dynamic Mesh Compression
- URL: http://arxiv.org/abs/2501.01717v2
- Date: Wed, 02 Jul 2025 18:40:08 GMT
- Title: KeyNode-Driven Geometry Coding for Real-World Scanned Human Dynamic Mesh Compression
- Authors: Huong Hoang, Truong Nguyen, Pamela Cosman,
- Abstract summary: The compression of real-world scanned 3D human dynamic meshes is an emerging research area, driven by applications such as telepresence, virtual reality, and 3D digital streaming.<n>We propose a compression method designed for real-world scanned human dynamic meshes, leveraging embedded key nodes.<n>Our method achieves significant improvements over the state-of-the-art, with an average savings of 58.43% across the evaluated sequences.
- Score: 2.125376833189004
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The compression of real-world scanned 3D human dynamic meshes is an emerging research area, driven by applications such as telepresence, virtual reality, and 3D digital streaming. Unlike synthesized dynamic meshes with fixed topology, scanned dynamic meshes often not only have varying topology across frames but also scan defects such as holes and outliers, increasing the complexity of prediction and compression. Additionally, human meshes often combine rigid and non-rigid motions, making accurate prediction and encoding significantly more difficult compared to objects that exhibit purely rigid motion. To address these challenges, we propose a compression method designed for real-world scanned human dynamic meshes, leveraging embedded key nodes. The temporal motion of each vertex is formulated as a distance-weighted combination of transformations from neighboring key nodes, requiring the transmission of solely the key nodes' transformations. To enhance the quality of the KeyNode-driven prediction, we introduce an octree-based residual coding scheme and a Dual-direction prediction mode, which uses I-frames from both directions. Extensive experiments demonstrate that our method achieves significant improvements over the state-of-the-art, with an average bitrate savings of 58.43% across the evaluated sequences, particularly excelling at low bitrates.
Related papers
- D-FCGS: Feedforward Compression of Dynamic Gaussian Splatting for Free-Viewpoint Videos [12.24209693552492]
Free-viewpoint video (FVV) enables immersive 3D experiences, but efficient compression of dynamic 3D representations remains a major challenge.<n>This paper presents Feedforward Compression of Dynamic Gaussian Splatting (D-FCGS), a novel feedforward framework for compressing temporally correlated Gaussian point cloud sequences.<n> Experiments show that it matches the rate-distortion performance of optimization-based methods, achieving over 40 times compression in under 2 seconds.
arXiv Detail & Related papers (2025-07-08T10:39:32Z) - Multi-Modal Graph Convolutional Network with Sinusoidal Encoding for Robust Human Action Segmentation [10.122882293302787]
temporal segmentation of human actions is critical for intelligent robots in collaborative settings.<n>We propose a Multi-Modal Graph Convolutional Network (MMGCN) that integrates low-frame-rate (e.g., 1 fps) visual data with high-frame-rate (e.g., 30 fps) motion data.<n>Our approach outperforms state-of-the-art methods, especially in action segmentation accuracy.
arXiv Detail & Related papers (2025-07-01T13:55:57Z) - Progressive Inertial Poser: Progressive Real-Time Kinematic Chain Estimation for 3D Full-Body Pose from Three IMU Sensors [25.67875816218477]
Full-body pose estimation from sparse tracking signals is not limited by environmental conditions or recording range.<n>Previous works either face the challenge of wearing additional sensors on the pelvis and lower-body or rely on external visual sensors to obtain global positions of key joints.<n>To improve the practicality of the technology for virtual reality applications, we estimate full-body poses using only inertial data obtained from three Inertial Measurement Unit (IMU) sensors worn on the head and wrists.
arXiv Detail & Related papers (2025-05-08T15:28:09Z) - CompGS++: Compressed Gaussian Splatting for Static and Dynamic Scene Representation [60.712165339762116]
CompGS++ is a novel framework that leverages compact Gaussian primitives to achieve accurate 3D modeling.
Our design is based on the principle of eliminating redundancy both between and within primitives.
Our implementation will be made publicly available on GitHub to facilitate further research.
arXiv Detail & Related papers (2025-04-17T15:33:01Z) - A Joint Visual Compression and Perception Framework for Neuralmorphic Spiking Camera [42.74887012434441]
We present the notion of Spike Coding for Intelligence (SCI), wherein spike sequences are compressed and optimized for both bit-rate and task performance.
We achieve an average 17.25% BD-rate reduction compared to SOTA codecs and a 4.3% accuracy improvement over SpiReco for spike-based classification.
arXiv Detail & Related papers (2025-03-04T15:44:33Z) - ALOcc: Adaptive Lifting-based 3D Semantic Occupancy and Cost Volume-based Flow Prediction [89.89610257714006]
Existing methods prioritize higher accuracy to cater to the demands of these tasks.
We introduce a series of targeted improvements for 3D semantic occupancy prediction and flow estimation.
Our purelytemporalal architecture framework, named ALOcc, achieves an optimal tradeoff between speed and accuracy.
arXiv Detail & Related papers (2024-11-12T11:32:56Z) - Ultron: Enabling Temporal Geometry Compression of 3D Mesh Sequences using Temporal Correspondence and Mesh Deformation [2.0914328542137346]
Existing 3D model compression methods primarily focus on static models and do not consider inter-frame information.
This paper proposes a method to compress mesh sequences with arbitrary topology using temporal correspondence and mesh deformation.
arXiv Detail & Related papers (2024-09-08T16:34:19Z) - Text-guided 3D Human Motion Generation with Keyframe-based Parallel Skip Transformer [62.29951737214263]
Existing algorithms directly generate the full sequence which is expensive and prone to errors.
We propose KeyMotion, that generates plausible human motion sequences corresponding to input text.
We use a Variationalcoder (VAE) with Kullback-Leibler regularization to project the Autoencoder into a latent space.
For the reverse diffusion, we propose a novel Parallel Skip Transformer that performs cross-modal attention between the design latents and text condition.
arXiv Detail & Related papers (2024-05-24T11:12:37Z) - Learning Dynamic Tetrahedra for High-Quality Talking Head Synthesis [31.90503003079933]
We introduce Dynamic Tetrahedra (DynTet), a novel hybrid representation that encodes explicit dynamic meshes by neural networks.
Compared with prior works, DynTet demonstrates significant improvements in fidelity, lip synchronization, and real-time performance according to various metrics.
arXiv Detail & Related papers (2024-02-27T09:56:15Z) - Dynamic Inertial Poser (DynaIP): Part-Based Motion Dynamics Learning for
Enhanced Human Pose Estimation with Sparse Inertial Sensors [17.3834029178939]
This paper introduces a novel human pose estimation approach using sparse inertial sensors.
It leverages a diverse array of real inertial motion capture data from different skeleton formats to improve motion diversity and model generalization.
The approach demonstrates superior performance over state-of-the-art models across five public datasets, notably reducing pose error by 19% on the DIP-IMU dataset.
arXiv Detail & Related papers (2023-12-02T13:17:10Z) - Dynamic Frame Interpolation in Wavelet Domain [57.25341639095404]
Video frame is an important low-level computation vision task, which can increase frame rate for more fluent visual experience.
Existing methods have achieved great success by employing advanced motion models and synthesis networks.
WaveletVFI can reduce computation up to 40% while maintaining similar accuracy, making it perform more efficiently against other state-of-the-arts.
arXiv Detail & Related papers (2023-09-07T06:41:15Z) - SpATr: MoCap 3D Human Action Recognition based on Spiral Auto-encoder and Transformer Network [1.4732811715354455]
We introduce a novel approach for 3D human action recognition, denoted as SpATr (Spiral Auto-encoder and Transformer Network)
A lightweight auto-encoder, based on spiral convolutions, is employed to extract spatial geometrical features from each 3D mesh.
The proposed method is evaluated on three prominent 3D human action datasets: Babel, MoVi, and BMLrub.
arXiv Detail & Related papers (2023-06-30T11:49:00Z) - Scene Synthesis via Uncertainty-Driven Attribute Synchronization [52.31834816911887]
This paper introduces a novel neural scene synthesis approach that can capture diverse feature patterns of 3D scenes.
Our method combines the strength of both neural network-based and conventional scene synthesis approaches.
arXiv Detail & Related papers (2021-08-30T19:45:07Z) - Secrets of 3D Implicit Object Shape Reconstruction in the Wild [92.5554695397653]
Reconstructing high-fidelity 3D objects from sparse, partial observation is crucial for various applications in computer vision, robotics, and graphics.
Recent neural implicit modeling methods show promising results on synthetic or dense datasets.
But, they perform poorly on real-world data that is sparse and noisy.
This paper analyzes the root cause of such deficient performance of a popular neural implicit model.
arXiv Detail & Related papers (2021-01-18T03:24:48Z) - Causal Contextual Prediction for Learned Image Compression [36.08393281509613]
We propose the concept of separate entropy coding to leverage a serial decoding process for causal contextual entropy prediction in the latent space.
A causal context model is proposed that separates the latents across channels and makes use of cross-channel relationships to generate highly informative contexts.
We also propose a causal global prediction model, which is able to find global reference points for accurate predictions of unknown points.
arXiv Detail & Related papers (2020-11-19T08:15:10Z) - Hamiltonian Dynamics for Real-World Shape Interpolation [66.47407593823208]
We revisit the classical problem of 3D shape and propose a novel, physically plausible approach based on Hamiltonian dynamics.
Our method yields exactly volume preserving intermediate shapes, avoids self-intersections and is scalable to high resolution scans.
arXiv Detail & Related papers (2020-04-10T18:38:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.