NanoHTNet: Nano Human Topology Network for Efficient 3D Human Pose Estimation
- URL: http://arxiv.org/abs/2501.15763v1
- Date: Mon, 27 Jan 2025 04:16:42 GMT
- Title: NanoHTNet: Nano Human Topology Network for Efficient 3D Human Pose Estimation
- Authors: Jialun Cai, Mengyuan Liu, Hong Liu, Wenhao Li, Shuheng Zhou,
- Abstract summary: 3D human pose estimation (HPE) is limited by resource-constrained edge devices.<n>We propose a Nano Human Topology Network (NanoHTNet) to capture explicit features.<n>We also propose PoseCLR to align 2D poses from diverse viewpoints in a proxy task.
- Score: 24.059039655555807
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The widespread application of 3D human pose estimation (HPE) is limited by resource-constrained edge devices, requiring more efficient models. A key approach to enhancing efficiency involves designing networks based on the structural characteristics of input data. However, effectively utilizing the structural priors in human skeletal inputs remains challenging. To address this, we leverage both explicit and implicit spatio-temporal priors of the human body through innovative model design and a pre-training proxy task. First, we propose a Nano Human Topology Network (NanoHTNet), a tiny 3D HPE network with stacked Hierarchical Mixers to capture explicit features. Specifically, the spatial Hierarchical Mixer efficiently learns the human physical topology across multiple semantic levels, while the temporal Hierarchical Mixer with discrete cosine transform and low-pass filtering captures local instantaneous movements and global action coherence. Moreover, Efficient Temporal-Spatial Tokenization (ETST) is introduced to enhance spatio-temporal interaction and reduce computational complexity significantly. Second, PoseCLR is proposed as a general pre-training method based on contrastive learning for 3D HPE, aimed at extracting implicit representations of human topology. By aligning 2D poses from diverse viewpoints in the proxy task, PoseCLR aids 3D HPE encoders like NanoHTNet in more effectively capturing the high-dimensional features of the human body, leading to further performance improvements. Extensive experiments verify that NanoHTNet with PoseCLR outperforms other state-of-the-art methods in efficiency, making it ideal for deployment on edge devices like the Jetson Nano. Code and models are available at https://github.com/vefalun/NanoHTNet.
Related papers
- VolumetricSMPL: A Neural Volumetric Body Model for Efficient Interactions, Contacts, and Collisions [17.542908770824596]
Parametric human body models play a crucial role in computer graphics and vision, enabling applications ranging from human motion analysis to understanding human-environment interactions.<n>To address this limitation, recent research has explored volumetric neural implicit body models.<n>We introduce VolumetricSMPL, a neural body model that generates compact, yet efficient decoders.
arXiv Detail & Related papers (2025-06-29T13:48:38Z) - ALOcc: Adaptive Lifting-based 3D Semantic Occupancy and Cost Volume-based Flow Prediction [89.89610257714006]
Existing methods prioritize higher accuracy to cater to the demands of these tasks.
We introduce a series of targeted improvements for 3D semantic occupancy prediction and flow estimation.
Our purelytemporalal architecture framework, named ALOcc, achieves an optimal tradeoff between speed and accuracy.
arXiv Detail & Related papers (2024-11-12T11:32:56Z) - Graph and Skipped Transformer: Exploiting Spatial and Temporal Modeling Capacities for Efficient 3D Human Pose Estimation [36.93661496405653]
We take a global approach to exploit Transformer-temporal information with a concise Graph and Skipped Transformer architecture.
Specifically, in 3D pose stage, coarse-grained body parts are deployed to construct a fully data-driven adaptive model.
Experiments are conducted on Human3.6M, MPI-INF-3DHP and Human-Eva benchmarks.
arXiv Detail & Related papers (2024-07-03T10:42:09Z) - BinaryHPE: 3D Human Pose and Shape Estimation via Binarization [99.83378699846767]
3D human pose and shape estimation (HPE) aims to reconstruct the 3D human body, face, and hands from a single image.
We propose BinaryHPE, a novel binarization method designed to estimate the 3D human body, face, and hands parameters efficiently.
arXiv Detail & Related papers (2023-11-24T07:51:50Z) - EVOPOSE: A Recursive Transformer For 3D Human Pose Estimation With
Kinematic Structure Priors [72.33767389878473]
We propose a transformer-based model EvoPose to introduce the human body prior knowledge for 3D human pose estimation effectively.
A Structural Priors Representation (SPR) module represents human priors as structural features carrying rich body patterns.
A Recursive Refinement (RR) module is applied to the 3D pose outputs by utilizing estimated results and further injects human priors simultaneously.
arXiv Detail & Related papers (2023-06-16T04:09:16Z) - Highly Efficient 3D Human Pose Tracking from Events with Spiking Spatiotemporal Transformer [23.15179173446486]
We introduce the first sparse Spiking Neural Networks (SNNs) framework for 3D human pose tracking based solely on events.<n>Our approach eliminates the need to convert sparse data to dense formats or incorporate additional images, thereby fully exploiting the innate sparsity of input events.<n> Empirical experiments demonstrate the superiority of our approach over existing state-of-the-art (SOTA) ANN-based methods, requiring only 19.1% FLOPs and 3.6% cost energy.
arXiv Detail & Related papers (2023-03-16T22:56:12Z) - HTNet: Human Topology Aware Network for 3D Human Pose Estimation [12.120648336697592]
3D human pose estimation errors would propagate along the human body topology and accumulate at the end joints of limbs.
We design an Intra-Part Constraint module that utilizes the parent nodes as the reference to build topological constraints for end joints at the part level.
We propose a novel Human Topology aware Network (HTNet), which adopts a channel-split progressive strategy to sequentially learn the structural priors of the human topology.
arXiv Detail & Related papers (2023-02-20T06:31:29Z) - 3D Convolutional with Attention for Action Recognition [6.238518976312625]
Current action recognition methods use computationally expensive models for learning-temporal dependencies of the action.
This paper proposes a deep neural network architecture for learning such dependencies consisting of a 3D convolutional layer, fully connected layers and attention layer.
The method first learns spatial features and temporal of actions through 3D-CNN, and then the attention temporal mechanism helps the model to locate attention to essential features.
arXiv Detail & Related papers (2022-06-05T15:12:57Z) - DMS-GCN: Dynamic Mutiscale Spatiotemporal Graph Convolutional Networks
for Human Motion Prediction [8.142947808507365]
We propose a feed-forward deep neural network for motion prediction.
The entire model is suitable for all actions and follows a framework of encoder-decoder.
Our approach outperforms SOTA methods on the datasets of Human3.6M and CMU Mocap.
arXiv Detail & Related papers (2021-12-20T07:07:03Z) - Geometry-Guided Progressive NeRF for Generalizable and Efficient Neural
Human Rendering [139.159534903657]
We develop a generalizable and efficient Neural Radiance Field (NeRF) pipeline for high-fidelity free-viewpoint human body details.
To better tackle self-occlusion, we devise a geometry-guided multi-view feature integration approach.
For achieving higher rendering efficiency, we introduce a geometry-guided progressive rendering pipeline.
arXiv Detail & Related papers (2021-12-08T14:42:10Z) - Higher-Order Implicit Fairing Networks for 3D Human Pose Estimation [1.1501261942096426]
We introduce a higher-order graph convolutional framework with initial residual connections for 2D-to-3D pose estimation.
Our model is able to capture the long-range dependencies between body joints.
Experiments and ablations studies conducted on two standard benchmarks demonstrate the effectiveness of our model.
arXiv Detail & Related papers (2021-11-01T13:48:55Z) - Revisiting Skeleton-based Action Recognition [107.08112310075114]
PoseC3D is a new approach to skeleton-based action recognition, which relies on a 3D heatmap instead stack a graph sequence as the base representation of human skeletons.
On four challenging datasets, PoseC3D consistently obtains superior performance, when used alone on skeletons and in combination with the RGB modality.
arXiv Detail & Related papers (2021-04-28T06:32:17Z) - EvoPose2D: Pushing the Boundaries of 2D Human Pose Estimation using
Accelerated Neuroevolution with Weight Transfer [82.28607779710066]
We explore the application of neuroevolution, a form of neural architecture search inspired by biological evolution, in the design of 2D human pose networks.
Our method produces network designs that are more efficient and more accurate than state-of-the-art hand-designed networks.
arXiv Detail & Related papers (2020-11-17T05:56:16Z) - Anatomy-aware 3D Human Pose Estimation with Bone-based Pose
Decomposition [92.99291528676021]
Instead of directly regressing the 3D joint locations, we decompose the task into bone direction prediction and bone length prediction.
Our motivation is the fact that the bone lengths of a human skeleton remain consistent across time.
Our full model outperforms the previous best results on Human3.6M and MPI-INF-3DHP datasets.
arXiv Detail & Related papers (2020-02-24T15:49:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.