Unified 3D Mesh Recovery of Humans and Animals by Learning Animal
Exercise
- URL: http://arxiv.org/abs/2111.02450v1
- Date: Wed, 3 Nov 2021 18:15:50 GMT
- Title: Unified 3D Mesh Recovery of Humans and Animals by Learning Animal
Exercise
- Authors: Kim Youwang, Kim Ji-Yeon, Kyungdon Joo, Tae-Hyun Oh
- Abstract summary: We propose an end-to-end unified 3D mesh recovery of humans and quadruped animals trained in a weakly-supervised way.
We exploit the morphological similarity between humans and animals, motivated by animal exercise where humans imitate animal poses.
- Score: 29.52068540448424
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose an end-to-end unified 3D mesh recovery of humans and quadruped
animals trained in a weakly-supervised way. Unlike recent work focusing on a
single target class only, we aim to recover 3D mesh of broader classes with a
single multi-task model. However, there exists no dataset that can directly
enable multi-task learning due to the absence of both human and animal
annotations for a single object, e.g., a human image does not have animal pose
annotations; thus, we have to devise a new way to exploit heterogeneous
datasets. To make the unstable disjoint multi-task learning jointly trainable,
we propose to exploit the morphological similarity between humans and animals,
motivated by animal exercise where humans imitate animal poses. We realize the
morphological similarity by semantic correspondences, called sub-keypoint,
which enables joint training of human and animal mesh regression branches.
Besides, we propose class-sensitive regularization methods to avoid a
mean-shape bias and to improve the distinctiveness across multi-classes. Our
method performs favorably against recent uni-modal models on various human and
animal datasets while being far more compact.
Related papers
- MultiPly: Reconstruction of Multiple People from Monocular Video in the Wild [32.6521941706907]
We present MultiPly, a novel framework to reconstruct multiple people in 3D from monocular in-the-wild videos.
We first define a layered neural representation for the entire scene, composited by individual human and background models.
We learn the layered neural representation from videos via our layer-wise differentiable volume rendering.
arXiv Detail & Related papers (2024-06-03T17:59:57Z) - Learning the 3D Fauna of the Web [70.01196719128912]
We develop 3D-Fauna, an approach that learns a pan-category deformable 3D animal model for more than 100 animal species jointly.
One crucial bottleneck of modeling animals is the limited availability of training data.
We show that prior category-specific attempts fail to generalize to rare species with limited training images.
arXiv Detail & Related papers (2024-01-04T18:32:48Z) - Cross-view and Cross-pose Completion for 3D Human Understanding [22.787947086152315]
We propose a pre-training approach based on self-supervised learning that works on human-centric data using only images.
We pre-train a model for body-centric tasks and one for hand-centric tasks.
With a generic transformer architecture, these models outperform existing self-supervised pre-training methods on a wide set of human-centric downstream tasks.
arXiv Detail & Related papers (2023-11-15T16:51:18Z) - LatentHuman: Shape-and-Pose Disentangled Latent Representation for Human
Bodies [78.17425779503047]
We propose a novel neural implicit representation for the human body.
It is fully differentiable and optimizable with disentangled shape and pose latent spaces.
Our model can be trained and fine-tuned directly on non-watertight raw data with well-designed losses.
arXiv Detail & Related papers (2021-11-30T04:10:57Z) - 3D Multi-bodies: Fitting Sets of Plausible 3D Human Models to Ambiguous
Image Data [77.57798334776353]
We consider the problem of obtaining dense 3D reconstructions of humans from single and partially occluded views.
We suggest that ambiguities can be modelled more effectively by parametrizing the possible body shapes and poses.
We show that our method outperforms alternative approaches in ambiguous pose recovery on standard benchmarks for 3D humans.
arXiv Detail & Related papers (2020-11-02T13:55:31Z) - Unsupervised Shape and Pose Disentanglement for 3D Meshes [49.431680543840706]
We present a simple yet effective approach to learn disentangled shape and pose representations in an unsupervised setting.
We use a combination of self-consistency and cross-consistency constraints to learn pose and shape space from registered meshes.
We demonstrate the usefulness of learned representations through a number of tasks including pose transfer and shape retrieval.
arXiv Detail & Related papers (2020-07-22T11:00:27Z) - Transferring Dense Pose to Proximal Animal Classes [83.84439508978126]
We show that it is possible to transfer the knowledge existing in dense pose recognition for humans, as well as in more general object detectors and segmenters, to the problem of dense pose recognition in other classes.
We do this by establishing a DensePose model for the new animal which is also geometrically aligned to humans.
We also introduce two benchmark datasets labelled in the manner of DensePose for the class chimpanzee and use them to evaluate our approach.
arXiv Detail & Related papers (2020-02-28T21:43:53Z) - Chained Representation Cycling: Learning to Estimate 3D Human Pose and
Shape by Cycling Between Representations [73.11883464562895]
We propose a new architecture that facilitates unsupervised, or lightly supervised, learning.
We demonstrate the method by learning 3D human pose and shape from un-paired and un-annotated images.
While we present results for modeling humans, our formulation is general and can be applied to other vision problems.
arXiv Detail & Related papers (2020-01-06T14:54:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.