Learning 3D Human Pose Estimation from Dozens of Datasets using a
Geometry-Aware Autoencoder to Bridge Between Skeleton Formats
- URL: http://arxiv.org/abs/2212.14474v1
- Date: Thu, 29 Dec 2022 22:22:49 GMT
- Title: Learning 3D Human Pose Estimation from Dozens of Datasets using a
Geometry-Aware Autoencoder to Bridge Between Skeleton Formats
- Authors: Istv\'an S\'ar\'andi, Alexander Hermans, Bastian Leibe
- Abstract summary: We propose a novel affine-combining autoencoder (ACAE) method to perform dimensionality reduction on the number of landmarks.
Our approach scales to an extreme multi-dataset regime, where we use 28 3D human pose datasets to supervise one model.
- Score: 80.12253291709673
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning-based 3D human pose estimation performs best when trained on
large amounts of labeled data, making combined learning from many datasets an
important research direction. One obstacle to this endeavor are the different
skeleton formats provided by different datasets, i.e., they do not label the
same set of anatomical landmarks. There is little prior research on how to best
supervise one model with such discrepant labels. We show that simply using
separate output heads for different skeletons results in inconsistent depth
estimates and insufficient information sharing across skeletons. As a remedy,
we propose a novel affine-combining autoencoder (ACAE) method to perform
dimensionality reduction on the number of landmarks. The discovered latent 3D
points capture the redundancy among skeletons, enabling enhanced information
sharing when used for consistency regularization. Our approach scales to an
extreme multi-dataset regime, where we use 28 3D human pose datasets to
supervise one model, which outperforms prior work on a range of benchmarks,
including the challenging 3D Poses in the Wild (3DPW) dataset. Our code and
models are available for research purposes.
Related papers
- Learning Localization of Body and Finger Animation Skeleton Joints on Three-Dimensional Models of Human Bodies [0.0]
Our work proposes one such solution to the problem of positioning body and finger animation skeleton joints within 3D models of human bodies.
By comparing our method with the state-of-the-art, we show that it is possible to achieve significantly better results with a simpler architecture.
arXiv Detail & Related papers (2024-07-11T13:16:02Z) - Neural Localizer Fields for Continuous 3D Human Pose and Shape Estimation [32.30055363306321]
We propose a paradigm for seamlessly unifying different human pose and shape-related tasks and datasets.
Our formulation is centered on the ability - both at training and test time - to query any arbitrary point of the human volume.
We can naturally exploit differently annotated data sources including mesh, 2D/3D skeleton and dense pose, without having to convert between them.
arXiv Detail & Related papers (2024-07-10T10:44:18Z) - Enhancing Generalizability of Representation Learning for Data-Efficient 3D Scene Understanding [50.448520056844885]
We propose a generative Bayesian network to produce diverse synthetic scenes with real-world patterns.
A series of experiments robustly display our method's consistent superiority over existing state-of-the-art pre-training approaches.
arXiv Detail & Related papers (2024-06-17T07:43:53Z) - Leveraging Large-Scale Pretrained Vision Foundation Models for
Label-Efficient 3D Point Cloud Segmentation [67.07112533415116]
We present a novel framework that adapts various foundational models for the 3D point cloud segmentation task.
Our approach involves making initial predictions of 2D semantic masks using different large vision models.
To generate robust 3D semantic pseudo labels, we introduce a semantic label fusion strategy that effectively combines all the results via voting.
arXiv Detail & Related papers (2023-11-03T15:41:15Z) - Uni3D: A Unified Baseline for Multi-dataset 3D Object Detection [34.2238222373818]
Current 3D object detection models follow a single dataset-specific training and testing paradigm.
In this paper, we study the task of training a unified 3D detector from multiple datasets.
We present a Uni3D which leverages a simple data-level correction operation and a designed semantic-level coupling-and-recoupling module.
arXiv Detail & Related papers (2023-03-13T05:54:13Z) - Decanus to Legatus: Synthetic training for 2D-3D human pose lifting [26.108023246654646]
We propose an algorithm to generate infinite 3D synthetic human poses (Legatus) from a 3D pose distribution based on 10 initial handcrafted 3D poses (Decanus)
Our results show that we can achieve 3D pose estimation performance comparable to methods using real data from specialized datasets but in a zero-shot setup, showing the potential of our framework.
arXiv Detail & Related papers (2022-10-05T13:10:19Z) - RandomRooms: Unsupervised Pre-training from Synthetic Shapes and
Randomized Layouts for 3D Object Detection [138.2892824662943]
A promising solution is to make better use of the synthetic dataset, which consists of CAD object models, to boost the learning on real datasets.
Recent work on 3D pre-training exhibits failure when transfer features learned on synthetic objects to other real-world applications.
In this work, we put forward a new method called RandomRooms to accomplish this objective.
arXiv Detail & Related papers (2021-08-17T17:56:12Z) - Towards General Purpose Geometry-Preserving Single-View Depth Estimation [1.9573380763700712]
Single-view depth estimation (SVDE) plays a crucial role in scene understanding for AR applications, 3D modeling, and robotics.
Recent works have shown that a successful solution strongly relies on the diversity and volume of training data.
Our work shows that a model trained on this data along with conventional datasets can gain accuracy while predicting correct scene geometry.
arXiv Detail & Related papers (2020-09-25T20:06:13Z) - PointContrast: Unsupervised Pre-training for 3D Point Cloud
Understanding [107.02479689909164]
In this work, we aim at facilitating research on 3D representation learning.
We measure the effect of unsupervised pre-training on a large source set of 3D scenes.
arXiv Detail & Related papers (2020-07-21T17:59:22Z) - Cascaded deep monocular 3D human pose estimation with evolutionary
training data [76.3478675752847]
Deep representation learning has achieved remarkable accuracy for monocular 3D human pose estimation.
This paper proposes a novel data augmentation method that is scalable for massive amount of training data.
Our method synthesizes unseen 3D human skeletons based on a hierarchical human representation and synthesizings inspired by prior knowledge.
arXiv Detail & Related papers (2020-06-14T03:09:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.