PoseGen: Learning to Generate 3D Human Pose Dataset with NeRF
- URL: http://arxiv.org/abs/2312.14915v1
- Date: Fri, 22 Dec 2023 18:50:15 GMT
- Title: PoseGen: Learning to Generate 3D Human Pose Dataset with NeRF
- Authors: Mohsen Gholami, Rabab Ward, Z. Jane Wang
- Abstract summary: This paper proposes an end-to-end framework for generating 3D human pose datasets using Neural Radiance Fields (NeRF)
NeRFs are data-driven and do not require 3D scans of humans. Therefore, using NeRF for data generation is a new direction for convenient user-specific data generation.
- Score: 20.841557239621995
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper proposes an end-to-end framework for generating 3D human pose
datasets using Neural Radiance Fields (NeRF). Public datasets generally have
limited diversity in terms of human poses and camera viewpoints, largely due to
the resource-intensive nature of collecting 3D human pose data. As a result,
pose estimators trained on public datasets significantly underperform when
applied to unseen out-of-distribution samples. Previous works proposed
augmenting public datasets by generating 2D-3D pose pairs or rendering a large
amount of random data. Such approaches either overlook image rendering or
result in suboptimal datasets for pre-trained models. Here we propose PoseGen,
which learns to generate a dataset (human 3D poses and images) with a feedback
loss from a given pre-trained pose estimator. In contrast to prior art, our
generated data is optimized to improve the robustness of the pre-trained model.
The objective of PoseGen is to learn a distribution of data that maximizes the
prediction error of a given pre-trained model. As the learned data distribution
contains OOD samples of the pre-trained model, sampling data from such a
distribution for further fine-tuning a pre-trained model improves the
generalizability of the model. This is the first work that proposes NeRFs for
3D human data generation. NeRFs are data-driven and do not require 3D scans of
humans. Therefore, using NeRF for data generation is a new direction for
convenient user-specific data generation. Our extensive experiments show that
the proposed PoseGen improves two baseline models (SPIN and HybrIK) on four
datasets with an average 6% relative improvement.
Related papers
- Neural Localizer Fields for Continuous 3D Human Pose and Shape Estimation [32.30055363306321]
We propose a paradigm for seamlessly unifying different human pose and shape-related tasks and datasets.
Our formulation is centered on the ability - both at training and test time - to query any arbitrary point of the human volume.
We can naturally exploit differently annotated data sources including mesh, 2D/3D skeleton and dense pose, without having to convert between them.
arXiv Detail & Related papers (2024-07-10T10:44:18Z) - 3D Human Reconstruction in the Wild with Synthetic Data Using Generative Models [52.96248836582542]
We propose an effective approach based on recent diffusion models, termed HumanWild, which can effortlessly generate human images and corresponding 3D mesh annotations.
By exclusively employing generative models, we generate large-scale in-the-wild human images and high-quality annotations, eliminating the need for real-world data collection.
arXiv Detail & Related papers (2024-03-17T06:31:16Z) - Learning 3D Human Pose Estimation from Dozens of Datasets using a
Geometry-Aware Autoencoder to Bridge Between Skeleton Formats [80.12253291709673]
We propose a novel affine-combining autoencoder (ACAE) method to perform dimensionality reduction on the number of landmarks.
Our approach scales to an extreme multi-dataset regime, where we use 28 3D human pose datasets to supervise one model.
arXiv Detail & Related papers (2022-12-29T22:22:49Z) - A generic diffusion-based approach for 3D human pose prediction in the
wild [68.00961210467479]
3D human pose forecasting, i.e., predicting a sequence of future human 3D poses given a sequence of past observed ones, is a challenging-temporal task.
We provide a unified formulation in which incomplete elements (no matter in the prediction or observation) are treated as noise and propose a conditional diffusion model that denoises them and forecasts plausible poses.
We investigate our findings on four standard datasets and obtain significant improvements over the state-of-the-art.
arXiv Detail & Related papers (2022-10-11T17:59:54Z) - AdaptPose: Cross-Dataset Adaptation for 3D Human Pose Estimation by
Learnable Motion Generation [24.009674750548303]
Testing a pre-trained 3D pose estimator on a new dataset results in a major performance drop.
We propose AdaptPose, an end-to-end framework that generates synthetic 3D human motions from a source dataset.
Our method outperforms previous work in cross-dataset evaluations by 14% and previous semi-supervised learning methods that use partial 3D annotations by 16%.
arXiv Detail & Related papers (2021-12-22T00:27:52Z) - Adapted Human Pose: Monocular 3D Human Pose Estimation with Zero Real 3D
Pose Data [14.719976311208502]
Training vs. test data domain gaps often negatively affect model performance.
We present our adapted human pose (AHuP) approach that addresses adaptation problems in both appearance and pose spaces.
AHuP is built around a practical assumption that in real applications, data from target domain could be inaccessible or only limited information can be acquired.
arXiv Detail & Related papers (2021-05-23T01:20:40Z) - 3D Human Pose Regression using Graph Convolutional Network [68.8204255655161]
We propose a graph convolutional network named PoseGraphNet for 3D human pose regression from 2D poses.
Our model's performance is close to the state-of-the-art, but with much fewer parameters.
arXiv Detail & Related papers (2021-05-21T14:41:31Z) - Cascaded deep monocular 3D human pose estimation with evolutionary
training data [76.3478675752847]
Deep representation learning has achieved remarkable accuracy for monocular 3D human pose estimation.
This paper proposes a novel data augmentation method that is scalable for massive amount of training data.
Our method synthesizes unseen 3D human skeletons based on a hierarchical human representation and synthesizings inspired by prior knowledge.
arXiv Detail & Related papers (2020-06-14T03:09:52Z) - Exemplar Fine-Tuning for 3D Human Model Fitting Towards In-the-Wild 3D
Human Pose Estimation [107.07047303858664]
Large-scale human datasets with 3D ground-truth annotations are difficult to obtain in the wild.
We address this problem by augmenting existing 2D datasets with high-quality 3D pose fits.
The resulting annotations are sufficient to train from scratch 3D pose regressor networks that outperform the current state-of-the-art on in-the-wild benchmarks.
arXiv Detail & Related papers (2020-04-07T20:21:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.