K-Order Graph-oriented Transformer with GraAttention for 3D Pose and
Shape Estimation
- URL: http://arxiv.org/abs/2208.11328v1
- Date: Wed, 24 Aug 2022 06:54:03 GMT
- Title: K-Order Graph-oriented Transformer with GraAttention for 3D Pose and
Shape Estimation
- Authors: Weixi Zhao and Weiqiang Wang
- Abstract summary: We propose a novel attention-based 2D-to-3D pose estimation network for graph-structured data, named KOG-Transformer.
We also propose a 3D pose-to-shape estimation network for hand data, named GASE-Net.
- Score: 20.711789781518753
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a novel attention-based 2D-to-3D pose estimation network for
graph-structured data, named KOG-Transformer, and a 3D pose-to-shape estimation
network for hand data, named GASE-Net. Previous 3D pose estimation methods have
focused on various modifications to the graph convolution kernel, such as
abandoning weight sharing or increasing the receptive field. Some of these
methods employ attention-based non-local modules as auxiliary modules. In order
to better model the relationship between nodes in graph-structured data and
fuse the information of different neighbor nodes in a differentiated way, we
make targeted modifications to the attention module and propose two modules
designed for graph-structured data, graph relative positional encoding
multi-head self-attention (GR-MSA) and K-order graph-oriented multi-head
self-attention (KOG-MSA). By stacking GR-MSA and KOG-MSA, we propose a novel
network KOG-Transformer for 2D-to-3D pose estimation. Furthermore, we propose a
network for shape estimation on hand data, called GraAttention shape estimation
network (GASE-Net), which takes a 3D pose as input and gradually models the
shape of the hand from sparse to dense. We have empirically shown the
superiority of KOG-Transformer through extensive experiments. Experimental
results show that KOG-Transformer significantly outperforms the previous
state-of-the-art methods on the benchmark dataset Human3.6M. We evaluate the
effect of GASE-Net on two public available hand datasets, ObMan and
InterHand2.6M. GASE-Net can predict the corresponding shape for input pose with
strong generalization ability.
Related papers
- Boosting Cross-Domain Point Classification via Distilling Relational Priors from 2D Transformers [59.0181939916084]
Traditional 3D networks mainly focus on local geometric details and ignore the topological structure between local geometries.
We propose a novel Priors Distillation (RPD) method to extract priors from the well-trained transformers on massive images.
Experiments on the PointDA-10 and the Sim-to-Real datasets verify that the proposed method consistently achieves the state-of-the-art performance of UDA for point cloud classification.
arXiv Detail & Related papers (2024-07-26T06:29:09Z) - Multiple View Geometry Transformers for 3D Human Pose Estimation [35.26756920323391]
We aim to improve the 3D reasoning ability of Transformers in multi-view 3D human pose estimation.
We propose a novel hybrid model, MVGFormer, which has a series of geometric and appearance modules organized in an iterative manner.
arXiv Detail & Related papers (2023-11-18T06:32:40Z) - Iterative Graph Filtering Network for 3D Human Pose Estimation [5.177947445379688]
Graph convolutional networks (GCNs) have proven to be an effective approach for 3D human pose estimation.
In this paper, we introduce an iterative graph filtering framework for 3D human pose estimation.
Our approach builds upon the idea of iteratively solving graph filtering with Laplacian regularization.
arXiv Detail & Related papers (2023-07-29T20:46:44Z) - Skeleton-Parted Graph Scattering Networks for 3D Human Motion Prediction [120.08257447708503]
Graph convolutional network based methods that model the body-joints' relations, have recently shown great promise in 3D skeleton-based human motion prediction.
We propose a novel skeleton-parted graph scattering network (SPGSN)
SPGSN outperforms state-of-the-art methods by remarkable margins of 13.8%, 9.3% and 2.7% in terms of 3D mean per joint position error (MPJPE) on Human3.6M, CMU Mocap and 3DPW datasets, respectively.
arXiv Detail & Related papers (2022-07-31T05:51:39Z) - Graph-Based 3D Multi-Person Pose Estimation Using Multi-View Images [79.70127290464514]
We decompose the task into two stages, i.e. person localization and pose estimation.
And we propose three task-specific graph neural networks for effective message passing.
Our approach achieves state-of-the-art performance on CMU Panoptic and Shelf datasets.
arXiv Detail & Related papers (2021-09-13T11:44:07Z) - NeuroMorph: Unsupervised Shape Interpolation and Correspondence in One
Go [109.88509362837475]
We present NeuroMorph, a new neural network architecture that takes as input two 3D shapes.
NeuroMorph produces smooth and point-to-point correspondences between them.
It works well for a large variety of input shapes, including non-isometric pairs from different object categories.
arXiv Detail & Related papers (2021-06-17T12:25:44Z) - Mesh Graphormer [17.75480888764098]
We present a graph-convolution-reinforced transformer, named Mesh Graphormer, for 3D human pose and mesh reconstruction from a single image.
arXiv Detail & Related papers (2021-04-01T06:16:36Z) - Monocular 3D Detection with Geometric Constraints Embedding and
Semi-supervised Training [3.8073142980733]
We propose a novel framework for monocular 3D objects detection using only RGB images, called KM3D-Net.
We design a fully convolutional model to predict object keypoints, dimension, and orientation, and then combine these estimations with perspective geometry constraints to compute position attribute.
arXiv Detail & Related papers (2020-09-02T00:51:51Z) - Mix Dimension in Poincar\'{e} Geometry for 3D Skeleton-based Action
Recognition [57.98278794950759]
Graph Convolutional Networks (GCNs) have already demonstrated their powerful ability to model the irregular data.
We present a novel spatial-temporal GCN architecture which is defined via the Poincar'e geometry.
We evaluate our method on two current largest scale 3D datasets.
arXiv Detail & Related papers (2020-07-30T18:23:18Z) - Learning 3D Human Shape and Pose from Dense Body Parts [117.46290013548533]
We propose a Decompose-and-aggregate Network (DaNet) to learn 3D human shape and pose from dense correspondences of body parts.
Messages from local streams are aggregated to enhance the robust prediction of the rotation-based poses.
Our method is validated on both indoor and real-world datasets including Human3.6M, UP3D, COCO, and 3DPW.
arXiv Detail & Related papers (2019-12-31T15:09:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.