Double-chain Constraints for 3D Human Pose Estimation in Images and
Videos
- URL: http://arxiv.org/abs/2308.05298v1
- Date: Thu, 10 Aug 2023 02:41:18 GMT
- Title: Double-chain Constraints for 3D Human Pose Estimation in Images and
Videos
- Authors: Hongbo Kang, Yong Wang, Mengyuan Liu, Doudou Wu, Peng Liu, Wenming
Yang
- Abstract summary: Reconstructing 3D poses from 2D poses lacking depth information is challenging due to the complexity and diversity of human motion.
We propose a novel model, called Double-chain Graph Convolutional Transformer (DC-GCT), to constrain the pose.
We show that DC-GCT achieves state-of-the-art performance on two challenging datasets.
- Score: 21.42410292863492
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reconstructing 3D poses from 2D poses lacking depth information is
particularly challenging due to the complexity and diversity of human motion.
The key is to effectively model the spatial constraints between joints to
leverage their inherent dependencies. Thus, we propose a novel model, called
Double-chain Graph Convolutional Transformer (DC-GCT), to constrain the pose
through a double-chain design consisting of local-to-global and global-to-local
chains to obtain a complex representation more suitable for the current human
pose. Specifically, we combine the advantages of GCN and Transformer and design
a Local Constraint Module (LCM) based on GCN and a Global Constraint Module
(GCM) based on self-attention mechanism as well as a Feature Interaction Module
(FIM). The proposed method fully captures the multi-level dependencies between
human body joints to optimize the modeling capability of the model. Moreover,
we propose a method to use temporal information into the single-frame model by
guiding the video sequence embedding through the joint embedding of the target
frame, with negligible increase in computational cost. Experimental results
demonstrate that DC-GCT achieves state-of-the-art performance on two
challenging datasets (Human3.6M and MPI-INF-3DHP). Notably, our model achieves
state-of-the-art performance on all action categories in the Human3.6M dataset
using detected 2D poses from CPN, and our code is available at:
https://github.com/KHB1698/DC-GCT.
Related papers
- PoseMamba: Monocular 3D Human Pose Estimation with Bidirectional Global-Local Spatio-Temporal State Space Model [7.286873011001679]
We propose a purely SSM-based approach with linear correlations for complexityD human pose estimation in monocular video video.
Specifically, we propose a bidirectional global temporal-local-temporal block that comprehensively models human joint relations within individual frames as well as across frames.
This strategy provides a more logical geometric ordering strategy, resulting in a combined-local spatial scan.
arXiv Detail & Related papers (2024-08-07T04:38:03Z) - Graph and Skipped Transformer: Exploiting Spatial and Temporal Modeling Capacities for Efficient 3D Human Pose Estimation [36.93661496405653]
We take a global approach to exploit Transformer-temporal information with a concise Graph and Skipped Transformer architecture.
Specifically, in 3D pose stage, coarse-grained body parts are deployed to construct a fully data-driven adaptive model.
Experiments are conducted on Human3.6M, MPI-INF-3DHP and Human-Eva benchmarks.
arXiv Detail & Related papers (2024-07-03T10:42:09Z) - Iterative Graph Filtering Network for 3D Human Pose Estimation [5.177947445379688]
Graph convolutional networks (GCNs) have proven to be an effective approach for 3D human pose estimation.
In this paper, we introduce an iterative graph filtering framework for 3D human pose estimation.
Our approach builds upon the idea of iteratively solving graph filtering with Laplacian regularization.
arXiv Detail & Related papers (2023-07-29T20:46:44Z) - Global-to-Local Modeling for Video-based 3D Human Pose and Shape
Estimation [53.04781510348416]
Video-based 3D human pose and shape estimations are evaluated by intra-frame accuracy and inter-frame smoothness.
We propose to structurally decouple the modeling of long-term and short-term correlations in an end-to-end framework, Global-to-Local Transformer (GLoT)
Our GLoT surpasses previous state-of-the-art methods with the lowest model parameters on popular benchmarks, i.e., 3DPW, MPI-INF-3DHP, and Human3.6M.
arXiv Detail & Related papers (2023-03-26T14:57:49Z) - Jointformer: Single-Frame Lifting Transformer with Error Prediction and
Refinement for 3D Human Pose Estimation [11.592567773739407]
3D human pose estimation technologies have the potential to greatly increase the availability of human movement data.
The best-performing models for single-image 2D-3D lifting use graph convolutional networks (GCNs) that typically require some manual input to define the relationships between different body joints.
We propose a novel transformer-based approach that uses the more generalised self-attention mechanism to learn these relationships.
arXiv Detail & Related papers (2022-08-07T12:07:19Z) - CrossFormer: Cross Spatio-Temporal Transformer for 3D Human Pose
Estimation [24.08170512746056]
3D human pose estimation can be handled by encoding the geometric dependencies between the body parts and enforcing the kinematic constraints.
Recent Transformer has been adopted to encode the long-range dependencies between the joints in the spatial and temporal domains.
We propose a novel pose estimation Transformer featuring rich representations of body joints critical for capturing subtle changes across frames.
arXiv Detail & Related papers (2022-03-24T23:40:11Z) - Graph-Based 3D Multi-Person Pose Estimation Using Multi-View Images [79.70127290464514]
We decompose the task into two stages, i.e. person localization and pose estimation.
And we propose three task-specific graph neural networks for effective message passing.
Our approach achieves state-of-the-art performance on CMU Panoptic and Shelf datasets.
arXiv Detail & Related papers (2021-09-13T11:44:07Z) - Synthetic Training for Monocular Human Mesh Recovery [100.38109761268639]
This paper aims to estimate 3D mesh of multiple body parts with large-scale differences from a single RGB image.
The main challenge is lacking training data that have complete 3D annotations of all body parts in 2D images.
We propose a depth-to-scale (D2S) projection to incorporate the depth difference into the projection function to derive per-joint scale variants.
arXiv Detail & Related papers (2020-10-27T03:31:35Z) - HMOR: Hierarchical Multi-Person Ordinal Relations for Monocular
Multi-Person 3D Pose Estimation [54.23770284299979]
This paper introduces a novel form of supervision - Hierarchical Multi-person Ordinal Relations (HMOR)
HMOR encodes interaction information as the ordinal relations of depths and angles hierarchically.
An integrated top-down model is designed to leverage these ordinal relations in the learning process.
The proposed method significantly outperforms state-of-the-art methods on publicly available multi-person 3D pose datasets.
arXiv Detail & Related papers (2020-08-01T07:53:27Z) - Disentangling and Unifying Graph Convolutions for Skeleton-Based Action
Recognition [79.33539539956186]
We propose a simple method to disentangle multi-scale graph convolutions and a unified spatial-temporal graph convolutional operator named G3D.
By coupling these proposals, we develop a powerful feature extractor named MS-G3D based on which our model outperforms previous state-of-the-art methods on three large-scale datasets.
arXiv Detail & Related papers (2020-03-31T11:28:25Z) - Learning 3D Human Shape and Pose from Dense Body Parts [117.46290013548533]
We propose a Decompose-and-aggregate Network (DaNet) to learn 3D human shape and pose from dense correspondences of body parts.
Messages from local streams are aggregated to enhance the robust prediction of the rotation-based poses.
Our method is validated on both indoor and real-world datasets including Human3.6M, UP3D, COCO, and 3DPW.
arXiv Detail & Related papers (2019-12-31T15:09:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.