Coarse-to-fine Animal Pose and Shape Estimation
- URL: http://arxiv.org/abs/2111.08176v1
- Date: Tue, 16 Nov 2021 01:27:20 GMT
- Title: Coarse-to-fine Animal Pose and Shape Estimation
- Authors: Chen Li and Gim Hee Lee
- Abstract summary: We propose a coarse-to-fine approach to reconstruct 3D animal mesh from a single image.
The coarse estimation stage first estimates the pose, shape and translation parameters of the SMAL model.
The estimated meshes are then used as a starting point by a graph convolutional network (GCN) to predict a per-vertex deformation in the refinement stage.
- Score: 67.39635503744395
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most existing animal pose and shape estimation approaches reconstruct animal
meshes with a parametric SMAL model. This is because the low-dimensional pose
and shape parameters of the SMAL model makes it easier for deep networks to
learn the high-dimensional animal meshes. However, the SMAL model is learned
from scans of toy animals with limited pose and shape variations, and thus may
not be able to represent highly varying real animals well. This may result in
poor fittings of the estimated meshes to the 2D evidences, e.g. 2D keypoints or
silhouettes. To mitigate this problem, we propose a coarse-to-fine approach to
reconstruct 3D animal mesh from a single image. The coarse estimation stage
first estimates the pose, shape and translation parameters of the SMAL model.
The estimated meshes are then used as a starting point by a graph convolutional
network (GCN) to predict a per-vertex deformation in the refinement stage. This
combination of SMAL-based and vertex-based representations benefits from both
parametric and non-parametric representations. We design our mesh refinement
GCN (MRGCN) as an encoder-decoder structure with hierarchical feature
representations to overcome the limited receptive field of traditional GCNs.
Moreover, we observe that the global image feature used by existing animal mesh
reconstruction works is unable to capture detailed shape information for mesh
refinement. We thus introduce a local feature extractor to retrieve a
vertex-level feature and use it together with the global feature as the input
of the MRGCN. We test our approach on the StanfordExtra dataset and achieve
state-of-the-art results. Furthermore, we test the generalization capacity of
our approach on the Animal Pose and BADJA datasets. Our code is available at
the project website.
Related papers
- Equi-GSPR: Equivariant SE(3) Graph Network Model for Sparse Point Cloud Registration [2.814748676983944]
We propose a graph neural network model embedded with a local Spherical Euclidean 3D equivariance property through SE(3) message passing based propagation.
Our model is composed mainly of a descriptor module, equivariant graph layers, match similarity, and the final regression layers.
Experiments conducted on the 3DMatch and KITTI datasets exhibit the compelling and robust performance of our model compared to state-of-the-art approaches.
arXiv Detail & Related papers (2024-10-08T06:48:01Z) - Animal Avatars: Reconstructing Animatable 3D Animals from Casual Videos [26.65191922949358]
We present a method to build animatable dog avatars from monocular videos.
This is challenging as animals display a range of (unpredictable) non-rigid movements and have a variety of appearance details.
We develop an approach that links the video frames via a 4D solution that jointly solves for animal's pose variation, and its appearance.
arXiv Detail & Related papers (2024-03-25T18:41:43Z) - 3D Surface Reconstruction in the Wild by Deforming Shape Priors from
Synthetic Data [24.97027425606138]
Reconstructing the underlying 3D surface of an object from a single image is a challenging problem.
We present a new method for joint category-specific 3D reconstruction and object pose estimation from a single image.
Our approach achieves state-of-the-art reconstruction performance across several real-world datasets.
arXiv Detail & Related papers (2023-02-24T20:37:27Z) - Neural Capture of Animatable 3D Human from Monocular Video [38.974181971541846]
We present a novel paradigm of building an animatable 3D human representation from a monocular video input, such that it can be rendered in any unseen poses and views.
Our method is based on a dynamic Neural Radiance Field (NeRF) rigged by a mesh-based parametric 3D human model serving as a geometry proxy.
arXiv Detail & Related papers (2022-08-18T09:20:48Z) - Adversarial Parametric Pose Prior [106.12437086990853]
We learn a prior that restricts the SMPL parameters to values that produce realistic poses via adversarial training.
We show that our learned prior covers the diversity of the real-data distribution, facilitates optimization for 3D reconstruction from 2D keypoints, and yields better pose estimates when used for regression from images.
arXiv Detail & Related papers (2021-12-08T10:05:32Z) - A Lightweight Graph Transformer Network for Human Mesh Reconstruction
from 2D Human Pose [8.816462200869445]
We present GTRS, a pose-based method that can reconstruct human mesh from 2D human pose.
We demonstrate the efficiency and generalization of GTRS by extensive evaluations on the Human3.6M and 3DPW datasets.
arXiv Detail & Related papers (2021-11-24T18:48:03Z) - DensePose 3D: Lifting Canonical Surface Maps of Articulated Objects to
the Third Dimension [71.71234436165255]
We contribute DensePose 3D, a method that can learn such reconstructions in a weakly supervised fashion from 2D image annotations only.
Because it does not require 3D scans, DensePose 3D can be used for learning a wide range of articulated categories such as different animal species.
We show significant improvements compared to state-of-the-art non-rigid structure-from-motion baselines on both synthetic and real data on categories of humans and animals.
arXiv Detail & Related papers (2021-08-31T18:33:55Z) - Learning Deformable Tetrahedral Meshes for 3D Reconstruction [78.0514377738632]
3D shape representations that accommodate learning-based 3D reconstruction are an open problem in machine learning and computer graphics.
Previous work on neural 3D reconstruction demonstrated benefits, but also limitations, of point cloud, voxel, surface mesh, and implicit function representations.
We introduce Deformable Tetrahedral Meshes (DefTet) as a particular parameterization that utilizes volumetric tetrahedral meshes for the reconstruction problem.
arXiv Detail & Related papers (2020-11-03T02:57:01Z) - Pose2Mesh: Graph Convolutional Network for 3D Human Pose and Mesh
Recovery from a 2D Human Pose [70.23652933572647]
We propose a novel graph convolutional neural network (GraphCNN)-based system that estimates the 3D coordinates of human mesh vertices directly from the 2D human pose.
We show that our Pose2Mesh outperforms the previous 3D human pose and mesh estimation methods on various benchmark datasets.
arXiv Detail & Related papers (2020-08-20T16:01:56Z) - Shape Prior Deformation for Categorical 6D Object Pose and Size
Estimation [62.618227434286]
We present a novel learning approach to recover the 6D poses and sizes of unseen object instances from an RGB-D image.
We propose a deep network to reconstruct the 3D object model by explicitly modeling the deformation from a pre-learned categorical shape prior.
arXiv Detail & Related papers (2020-07-16T16:45:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.