GraphMLP: A Graph MLP-Like Architecture for 3D Human Pose Estimation
- URL: http://arxiv.org/abs/2206.06420v5
- Date: Sat, 21 Sep 2024 02:46:31 GMT
- Title: GraphMLP: A Graph MLP-Like Architecture for 3D Human Pose Estimation
- Authors: Wenhao Li, Mengyuan Liu, Hong Liu, Tianyu Guo, Ti Wang, Hao Tang, Nicu Sebe,
- Abstract summary: GraphMLP is a global-local-graphical unified architecture for 3D human pose estimation.
It incorporates the graph structure of human bodies into a model to meet the domain-specific demand of the 3D human pose.
It can be extended to model complex temporal dynamics in a simple way with negligible computational cost gains in the sequence length.
- Score: 68.65764751482774
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modern multi-layer perceptron (MLP) models have shown competitive results in learning visual representations without self-attention. However, existing MLP models are not good at capturing local details and lack prior knowledge of human body configurations, which limits their modeling power for skeletal representation learning. To address these issues, we propose a simple yet effective graph-reinforced MLP-Like architecture, named GraphMLP, that combines MLPs and graph convolutional networks (GCNs) in a global-local-graphical unified architecture for 3D human pose estimation. GraphMLP incorporates the graph structure of human bodies into an MLP model to meet the domain-specific demand of the 3D human pose, while allowing for both local and global spatial interactions. Furthermore, we propose to flexibly and efficiently extend the GraphMLP to the video domain and show that complex temporal dynamics can be effectively modeled in a simple way with negligible computational cost gains in the sequence length. To the best of our knowledge, this is the first MLP-Like architecture for 3D human pose estimation in a single frame and a video sequence. Extensive experiments show that the proposed GraphMLP achieves state-of-the-art performance on two datasets, i.e., Human3.6M and MPI-INF-3DHP. Code and models are available at https://github.com/Vegetebird/GraphMLP.
Related papers
- Graph Neural Machine: A New Model for Learning with Tabular Data [25.339493426758903]
Graph neural networks (GNNs) have recently become the standard tool for performing machine learning tasks on graphs.
In this work, we show that an representation is equivalent to an asynchronous message passing GNN model.
We then propose a new machine learning model for data, the so-called Graph Neural Machine (GNM)
arXiv Detail & Related papers (2024-02-05T10:22:15Z) - X-MLP: A Patch Embedding-Free MLP Architecture for Vision [4.493200639605705]
Multi-layer perceptron (MLP) architectures for vision have been popular again.
We propose X-MLP, an architecture constructed absolutely upon fully connected layers and free from patch embedding.
X-MLP is tested on ten benchmark datasets, all better performance than other vision models.
arXiv Detail & Related papers (2023-07-02T15:20:25Z) - Graph-Guided MLP-Mixer for Skeleton-Based Human Motion Prediction [14.988322340164391]
Graph Convolutional Networks (GCNs) have been widely used in human motion prediction, but their performance remains unsatisfactory.
Human-Mixer has been leveraged into human motion prediction as a promising alternative to GCNs.
By incorporating graph guidance, our textitGraph-Guided Mixer can effectively capture and utilize the specific connectivity patterns within human skeleton's graph representation.
arXiv Detail & Related papers (2023-04-07T08:11:16Z) - R2-MLP: Round-Roll MLP for Multi-View 3D Object Recognition [33.53114929452528]
Vision architectures based exclusively on multi-layer perceptrons (MLPs) have gained much attention in the computer vision community.
We present an achieves a view-based 3D object recognition task by considering the communications between patches from different views.
With a conceptually simple structure, our R$2$MLP achieves competitive performance compared with existing methods.
arXiv Detail & Related papers (2022-11-20T21:13:02Z) - MLPInit: Embarrassingly Simple GNN Training Acceleration with MLP
Initialization [51.76758674012744]
Training graph neural networks (GNNs) on large graphs is complex and extremely time consuming.
We propose an embarrassingly simple, yet hugely effective method for GNN training acceleration, called PeerInit.
arXiv Detail & Related papers (2022-09-30T21:33:51Z) - MLP-3D: A MLP-like 3D Architecture with Grouped Time Mixing [123.43419144051703]
We present a novel-like 3D architecture for video recognition.
The results are comparable to state-of-the-art widely-used 3D CNNs and video.
arXiv Detail & Related papers (2022-06-13T16:21:33Z) - Hire-MLP: Vision MLP via Hierarchical Rearrangement [58.33383667626998]
Hire-MLP is a simple yet competitive vision architecture via rearrangement.
The proposed Hire-MLP architecture is built with simple channel-mixing operations, thus enjoys high flexibility and inference speed.
Experiments show that our Hire-MLP achieves state-of-the-art performance on the ImageNet-1K benchmark.
arXiv Detail & Related papers (2021-08-30T16:11:04Z) - CycleMLP: A MLP-like Architecture for Dense Prediction [26.74203747156439]
CycleMLP is a versatile backbone for visual recognition and dense predictions.
It can cope with various image sizes and achieves linear computational complexity to image size by using local windows.
CycleMLP aims to provide a competitive baseline on object detection, instance segmentation, and semantic segmentation for models.
arXiv Detail & Related papers (2021-07-21T17:23:06Z) - AS-MLP: An Axial Shifted MLP Architecture for Vision [50.11765148947432]
An Axial Shifted architecture (AS-MLP) is proposed in this paper.
By axially shifting channels of the feature map, AS-MLP is able to obtain the information flow from different directions.
With the proposed AS-MLP architecture, our model obtains 83.3% Top-1 accuracy with 88M parameters and 15.2 GFLOPs on the ImageNet-1K dataset.
arXiv Detail & Related papers (2021-07-18T08:56:34Z) - MLP-Mixer: An all-MLP Architecture for Vision [93.16118698071993]
We present-Mixer, an architecture based exclusively on multi-layer perceptrons (MLPs).
Mixer attains competitive scores on image classification benchmarks, with pre-training and inference comparable to state-of-the-art models.
arXiv Detail & Related papers (2021-05-04T16:17:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.