Binarized 3D Whole-body Human Mesh Recovery
- URL: http://arxiv.org/abs/2311.14323v1
- Date: Fri, 24 Nov 2023 07:51:50 GMT
- Title: Binarized 3D Whole-body Human Mesh Recovery
- Authors: Zhiteng Li, Yulun Zhang, Jing Lin, Haotong Qin, Jinjin Gu, Xin Yuan,
Linghe Kong, Xiaokang Yang
- Abstract summary: We propose a Binarized Dual Residual Network (BiDRN) to estimate the 3D human body, face, and hands parameters efficiently.
BiDRN achieves comparable performance with full-precision method Hand4Whole while using just 22.1% parameters and 14.8% operations.
- Score: 104.13364878565737
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: 3D whole-body human mesh recovery aims to reconstruct the 3D human body,
face, and hands from a single image. Although powerful deep learning models
have achieved accurate estimation in this task, they require enormous memory
and computational resources. Consequently, these methods can hardly be deployed
on resource-limited edge devices. In this work, we propose a Binarized Dual
Residual Network (BiDRN), a novel quantization method to estimate the 3D human
body, face, and hands parameters efficiently. Specifically, we design a basic
unit Binarized Dual Residual Block (BiDRB) composed of Local Convolution
Residual (LCR) and Block Residual (BR), which can preserve full-precision
information as much as possible. For LCR, we generalize it to four kinds of
convolutional modules so that full-precision information can be propagated even
between mismatched dimensions. We also binarize the face and hands
box-prediction network as Binaried BoxNet, which can further reduce the model
redundancy. Comprehensive quantitative and qualitative experiments demonstrate
the effectiveness of BiDRN, which has a significant improvement over
state-of-the-art binarization algorithms. Moreover, our proposed BiDRN achieves
comparable performance with full-precision method Hand4Whole while using just
22.1% parameters and 14.8% operations. We will release all the code and
pretrained models.
Related papers
- UPose3D: Uncertainty-Aware 3D Human Pose Estimation with Cross-View and Temporal Cues [55.69339788566899]
UPose3D is a novel approach for multi-view 3D human pose estimation.
It improves robustness and flexibility without requiring direct 3D annotations.
arXiv Detail & Related papers (2024-04-23T00:18:00Z) - Interpretable 2D Vision Models for 3D Medical Images [47.75089895500738]
This study proposes a simple approach of adapting 2D networks with an intermediate feature representation for processing 3D images.
We show on all 3D MedMNIST datasets as benchmark and two real-world datasets consisting of several hundred high-resolution CT or MRI scans that our approach performs on par with existing methods.
arXiv Detail & Related papers (2023-07-13T08:27:09Z) - Back to Optimization: Diffusion-based Zero-Shot 3D Human Pose Estimation [29.037799937729687]
Learning-based methods have dominated the 3D human pose estimation (HPE) tasks with significantly better performance in most benchmarks than traditional optimization-based methods.
We propose textbfZero-shot textbfDiffusion-based textbfOptimization (textbfZeDO) pipeline for 3D HPE.
Our multi-hypothesis textittextbfZeDO achieves state-of-the-art (SOTA) performance on Human3.6M, with minMPJPE $51.4$
arXiv Detail & Related papers (2023-07-07T21:03:18Z) - Synthetic Training for Monocular Human Mesh Recovery [100.38109761268639]
This paper aims to estimate 3D mesh of multiple body parts with large-scale differences from a single RGB image.
The main challenge is lacking training data that have complete 3D annotations of all body parts in 2D images.
We propose a depth-to-scale (D2S) projection to incorporate the depth difference into the projection function to derive per-joint scale variants.
arXiv Detail & Related papers (2020-10-27T03:31:35Z) - Neural Descent for Visual 3D Human Pose and Shape [67.01050349629053]
We present deep neural network methodology to reconstruct the 3d pose and shape of people, given an input RGB image.
We rely on a recently introduced, expressivefull body statistical 3d human model, GHUM, trained end-to-end.
Central to our methodology, is a learning to learn and optimize approach, referred to as HUmanNeural Descent (HUND), which avoids both second-order differentiation.
arXiv Detail & Related papers (2020-08-16T13:38:41Z) - Cascaded deep monocular 3D human pose estimation with evolutionary
training data [76.3478675752847]
Deep representation learning has achieved remarkable accuracy for monocular 3D human pose estimation.
This paper proposes a novel data augmentation method that is scalable for massive amount of training data.
Our method synthesizes unseen 3D human skeletons based on a hierarchical human representation and synthesizings inspired by prior knowledge.
arXiv Detail & Related papers (2020-06-14T03:09:52Z) - Attention-Guided Version of 2D UNet for Automatic Brain Tumor
Segmentation [2.371982686172067]
Gliomas are the most common and aggressive among brain tumors, which cause a short life expectancy in their highest grade.
Deep convolutional neural networks (DCNNs) have achieved a remarkable performance in brain tumor segmentation.
However, this task is still difficult owing to high varying intensity and appearance of gliomas.
arXiv Detail & Related papers (2020-04-04T20:09:06Z) - HEMlets PoSh: Learning Part-Centric Heatmap Triplets for 3D Human Pose
and Shape Estimation [60.35776484235304]
This work attempts to address the uncertainty of lifting the detected 2D joints to the 3D space by introducing an intermediate state-Part-Centric Heatmap Triplets (HEMlets)
The HEMlets utilize three joint-heatmaps to represent the relative depth information of the end-joints for each skeletal body part.
A Convolutional Network (ConvNet) is first trained to predict HEMlets from the input image, followed by a volumetric joint-heatmap regression.
arXiv Detail & Related papers (2020-03-10T04:03:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.