MoCap-to-Visual Domain Adaptation for Efficient Human Mesh Estimation from 2D Keypoints
- URL: http://arxiv.org/abs/2404.07094v1
- Date: Wed, 10 Apr 2024 15:34:10 GMT
- Title: MoCap-to-Visual Domain Adaptation for Efficient Human Mesh Estimation from 2D Keypoints
- Authors: Bedirhan Uguz, Ozhan Suat, Batuhan Karagoz, Emre Akbas,
- Abstract summary: Key2Mesh is a model that takes a set of 2D human pose keypoints as input and estimates the corresponding body mesh.
Our results show that Key2Mesh sets the new state-of-the-art by outperforming other models in PA-MPJPE and 3DPW datasets.
- Score: 8.405938712823563
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper presents Key2Mesh, a model that takes a set of 2D human pose keypoints as input and estimates the corresponding body mesh. Since this process does not involve any visual (i.e. RGB image) data, the model can be trained on large-scale motion capture (MoCap) datasets, thereby overcoming the scarcity of image datasets with 3D labels. To enable the model's application on RGB images, we first run an off-the-shelf 2D pose estimator to obtain the 2D keypoints, and then feed these 2D keypoints to Key2Mesh. To improve the performance of our model on RGB images, we apply an adversarial domain adaptation (DA) method to bridge the gap between the MoCap and visual domains. Crucially, our DA method does not require 3D labels for visual data, which enables adaptation to target sets without the need for costly labels. We evaluate Key2Mesh for the task of estimating 3D human meshes from 2D keypoints, in the absence of RGB and mesh label pairs. Our results on widely used H3.6M and 3DPW datasets show that Key2Mesh sets the new state-of-the-art by outperforming other models in PA-MPJPE for both datasets, and in MPJPE and PVE for the 3DPW dataset. Thanks to our model's simple architecture, it operates at least 12x faster than the prior state-of-the-art model, LGD. Additional qualitative samples and code are available on the project website: https://key2mesh.github.io/.
Related papers
- ODIN: A Single Model for 2D and 3D Segmentation [34.612953668151036]
ODIN is a model that segment and label both 2D RGB images and 3D point clouds.
It achieves state-of-the-art performance on ScanNet200, Matterport3D and AI2THOR 3D segmentation benchmarks.
arXiv Detail & Related papers (2024-01-04T18:59:25Z) - Back to 3D: Few-Shot 3D Keypoint Detection with Back-Projected 2D Features [64.39691149255717]
Keypoint detection on 3D shapes requires semantic and geometric awareness while demanding high localization accuracy.
We employ a keypoint candidate optimization module which aims to match the average observed distribution of keypoints on the shape.
The resulting approach achieves a new state of the art for few-shot keypoint detection on the KeyPointNet dataset.
arXiv Detail & Related papers (2023-11-29T21:58:41Z) - Leveraging Large-Scale Pretrained Vision Foundation Models for
Label-Efficient 3D Point Cloud Segmentation [67.07112533415116]
We present a novel framework that adapts various foundational models for the 3D point cloud segmentation task.
Our approach involves making initial predictions of 2D semantic masks using different large vision models.
To generate robust 3D semantic pseudo labels, we introduce a semantic label fusion strategy that effectively combines all the results via voting.
arXiv Detail & Related papers (2023-11-03T15:41:15Z) - CheckerPose: Progressive Dense Keypoint Localization for Object Pose
Estimation with Graph Neural Network [66.24726878647543]
Estimating the 6-DoF pose of a rigid object from a single RGB image is a crucial yet challenging task.
Recent studies have shown the great potential of dense correspondence-based solutions.
We propose a novel pose estimation algorithm named CheckerPose, which improves on three main aspects.
arXiv Detail & Related papers (2023-03-29T17:30:53Z) - Learning 3D Representations from 2D Pre-trained Models via
Image-to-Point Masked Autoencoders [52.91248611338202]
We propose an alternative to obtain superior 3D representations from 2D pre-trained models via Image-to-Point Masked Autoencoders, named as I2P-MAE.
By self-supervised pre-training, we leverage the well learned 2D knowledge to guide 3D masked autoencoding.
I2P-MAE attains the state-of-the-art 90.11% accuracy, +3.68% to the second-best, demonstrating superior transferable capacity.
arXiv Detail & Related papers (2022-12-13T17:59:20Z) - Optimal and Robust Category-level Perception: Object Pose and Shape
Estimation from 2D and 3D Semantic Keypoints [24.232254155643574]
We consider a problem where one is given 2D or 3D sensor data picturing an object of a given category (e.g., a car) and has to reconstruct the 3D pose and shape of the object.
Our first contribution is to develop PACE3D* and PACE2D*, the first certifiably optimal solvers for pose and shape estimation.
Our second contribution is to developrobust versions of both solvers, named PACE3D# and PACE2D#.
arXiv Detail & Related papers (2022-06-24T21:58:00Z) - Multi-Modality Task Cascade for 3D Object Detection [22.131228757850373]
Many methods train two models in isolation and use simple feature concatenation to represent 3D sensor data.
We propose a novel Multi-Modality Task Cascade network (MTC-RCNN) that leverages 3D box proposals to improve 2D segmentation predictions.
We show that including a 2D network between two stages of 3D modules significantly improves both 2D and 3D task performance.
arXiv Detail & Related papers (2021-07-08T17:55:01Z) - Synthetic Training for Monocular Human Mesh Recovery [100.38109761268639]
This paper aims to estimate 3D mesh of multiple body parts with large-scale differences from a single RGB image.
The main challenge is lacking training data that have complete 3D annotations of all body parts in 2D images.
We propose a depth-to-scale (D2S) projection to incorporate the depth difference into the projection function to derive per-joint scale variants.
arXiv Detail & Related papers (2020-10-27T03:31:35Z) - KeypointNet: A Large-scale 3D Keypoint Dataset Aggregated from Numerous
Human Annotations [56.34297279246823]
KeypointNet is the first large-scale and diverse 3D keypoint dataset.
It contains 103,450 keypoints and 8,234 3D models from 16 object categories.
Ten state-of-the-art methods are benchmarked on our proposed dataset.
arXiv Detail & Related papers (2020-02-28T12:58:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.