ConvFormer: Parameter Reduction in Transformer Models for 3D Human Pose
Estimation by Leveraging Dynamic Multi-Headed Convolutional Attention
- URL: http://arxiv.org/abs/2304.02147v1
- Date: Tue, 4 Apr 2023 22:23:50 GMT
- Title: ConvFormer: Parameter Reduction in Transformer Models for 3D Human Pose
Estimation by Leveraging Dynamic Multi-Headed Convolutional Attention
- Authors: Alec Diaz-Arias and Dmitriy Shin
- Abstract summary: textbftextitConvFormer is a novel convolutional transformer for the 3D human pose estimation task.
We have validated our method on three common benchmark datasets: Human3.6M, MPI-INF-3DHP, and HumanEva.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Recently, fully-transformer architectures have replaced the defacto
convolutional architecture for the 3D human pose estimation task. In this paper
we propose \textbf{\textit{ConvFormer}}, a novel convolutional transformer that
leverages a new \textbf{\textit{dynamic multi-headed convolutional
self-attention}} mechanism for monocular 3D human pose estimation. We designed
a spatial and temporal convolutional transformer to comprehensively model human
joint relations within individual frames and globally across the motion
sequence. Moreover, we introduce a novel notion of \textbf{\textit{temporal
joints profile}} for our temporal ConvFormer that fuses complete temporal
information immediately for a local neighborhood of joint features. We have
quantitatively and qualitatively validated our method on three common benchmark
datasets: Human3.6M, MPI-INF-3DHP, and HumanEva. Extensive experiments have
been conducted to identify the optimal hyper-parameter set. These experiments
demonstrated that we achieved a \textbf{significant parameter reduction
relative to prior transformer models} while attaining State-of-the-Art (SOTA)
or near SOTA on all three datasets. Additionally, we achieved SOTA for Protocol
III on H36M for both GT and CPN detection inputs. Finally, we obtained SOTA on
all three metrics for the MPI-INF-3DHP dataset and for all three subjects on
HumanEva under Protocol II.
Related papers
- Enhancing 3D Human Pose Estimation Amidst Severe Occlusion with Dual Transformer Fusion [13.938406073551844]
This paper introduces a Dual Transformer Fusion (DTF) algorithm, a novel approach to obtain a holistic 3D pose estimation.
To enable precise 3D Human Pose Estimation, our approach leverages the innovative DTF architecture, which first generates a pair of intermediate views.
Our approach outperforms existing state-of-the-art methods on both datasets, yielding substantial improvements.
arXiv Detail & Related papers (2024-10-06T18:15:27Z) - UPose3D: Uncertainty-Aware 3D Human Pose Estimation with Cross-View and Temporal Cues [55.69339788566899]
UPose3D is a novel approach for multi-view 3D human pose estimation.
It improves robustness and flexibility without requiring direct 3D annotations.
arXiv Detail & Related papers (2024-04-23T00:18:00Z) - EVOPOSE: A Recursive Transformer For 3D Human Pose Estimation With
Kinematic Structure Priors [72.33767389878473]
We propose a transformer-based model EvoPose to introduce the human body prior knowledge for 3D human pose estimation effectively.
A Structural Priors Representation (SPR) module represents human priors as structural features carrying rich body patterns.
A Recursive Refinement (RR) module is applied to the 3D pose outputs by utilizing estimated results and further injects human priors simultaneously.
arXiv Detail & Related papers (2023-06-16T04:09:16Z) - (Fusionformer):Exploiting the Joint Motion Synergy with Fusion Network
Based On Transformer for 3D Human Pose Estimation [1.52292571922932]
Many previous methods lack the understanding of local joint information.cite8888987considers the temporal relationship of a single joint in this work.
Our proposed textbfFusionformer method introduces a global-temporal self-trajectory module and a cross-temporal self-trajectory module.
The results show an improvement of 2.4% MPJPE and 4.3% P-MPJPE on the Human3.6M dataset.
arXiv Detail & Related papers (2022-10-08T12:22:10Z) - CrossFormer: Cross Spatio-Temporal Transformer for 3D Human Pose
Estimation [24.08170512746056]
3D human pose estimation can be handled by encoding the geometric dependencies between the body parts and enforcing the kinematic constraints.
Recent Transformer has been adopted to encode the long-range dependencies between the joints in the spatial and temporal domains.
We propose a novel pose estimation Transformer featuring rich representations of body joints critical for capturing subtle changes across frames.
arXiv Detail & Related papers (2022-03-24T23:40:11Z) - MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose
Estimation in Video [75.23812405203778]
Recent solutions have been introduced to estimate 3D human pose from 2D keypoint sequence by considering body joints among all frames globally to learn-temporal correlation.
We propose Mix Mix, which has temporal transformer block to separately model the temporal motion of each joint and a transformer block inter-joint spatial correlation.
In addition, the network output is extended from the central frame to entire frames of input video, improving the coherence between the input and output benchmarks.
arXiv Detail & Related papers (2022-03-02T04:20:59Z) - Geometry-Contrastive Transformer for Generalized 3D Pose Transfer [95.56457218144983]
The intuition of this work is to perceive the geometric inconsistency between the given meshes with the powerful self-attention mechanism.
We propose a novel geometry-contrastive Transformer that has an efficient 3D structured perceiving ability to the global geometric inconsistencies.
We present a latent isometric regularization module together with a novel semi-synthesized dataset for the cross-dataset 3D pose transfer task.
arXiv Detail & Related papers (2021-12-14T13:14:24Z) - THUNDR: Transformer-based 3D HUmaN Reconstruction with Markers [67.8628917474705]
THUNDR is a transformer-based deep neural network methodology to reconstruct the 3d pose and shape of people.
We show state-of-the-art results on Human3.6M and 3DPW, for both the fully-supervised and the self-supervised models.
We observe very solid 3d reconstruction performance for difficult human poses collected in the wild.
arXiv Detail & Related papers (2021-06-17T09:09:24Z) - 3D Human Pose Estimation with Spatial and Temporal Transformers [59.433208652418976]
We present PoseFormer, a purely transformer-based approach for 3D human pose estimation in videos.
Inspired by recent developments in vision transformers, we design a spatial-temporal transformer structure.
We quantitatively and qualitatively evaluate our method on two popular and standard benchmark datasets.
arXiv Detail & Related papers (2021-03-18T18:14:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.