3D Human Pose, Shape and Texture from Low-Resolution Images and Videos
- URL: http://arxiv.org/abs/2103.06498v1
- Date: Thu, 11 Mar 2021 06:52:12 GMT
- Title: 3D Human Pose, Shape and Texture from Low-Resolution Images and Videos
- Authors: Xiangyu Xu, Hao Chen, Francesc Moreno-Noguer, Laszlo A. Jeni, Fernando
De la Torre
- Abstract summary: We propose RSC-Net, which consists of a Resolution-aware network, a Self-supervision loss, and a Contrastive learning scheme.
The proposed method is able to learn 3D body pose and shape across different resolutions with one single model.
We extend the RSC-Net to handle low-resolution videos and apply it to reconstruct textured 3D pedestrians from low-resolution input.
- Score: 107.36352212367179
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: 3D human pose and shape estimation from monocular images has been an active
research area in computer vision. Existing deep learning methods for this task
rely on high-resolution input, which however, is not always available in many
scenarios such as video surveillance and sports broadcasting. Two common
approaches to deal with low-resolution images are applying super-resolution
techniques to the input, which may result in unpleasant artifacts, or simply
training one model for each resolution, which is impractical in many realistic
applications.
To address the above issues, this paper proposes a novel algorithm called
RSC-Net, which consists of a Resolution-aware network, a Self-supervision loss,
and a Contrastive learning scheme. The proposed method is able to learn 3D body
pose and shape across different resolutions with one single model. The
self-supervision loss enforces scale-consistency of the output, and the
contrastive learning scheme enforces scale-consistency of the deep features. We
show that both these new losses provide robustness when learning in a
weakly-supervised manner. Moreover, we extend the RSC-Net to handle
low-resolution videos and apply it to reconstruct textured 3D pedestrians from
low-resolution input. Extensive experiments demonstrate that the RSC-Net can
achieve consistently better results than the state-of-the-art methods for
challenging low-resolution images.
Related papers
- Markerless Multi-view 3D Human Pose Estimation: a survey [0.49157446832511503]
3D human pose estimation aims to reconstruct the human skeleton of all the individuals in a scene by detecting several body joints.
No method is yet capable of solving all the challenges associated with the reconstruction of the 3D pose.
Further research is still required to develop an approach capable of quickly inferring a highly accurate 3D pose with bearable computation cost.
arXiv Detail & Related papers (2024-07-04T10:44:35Z) - PONet: Robust 3D Human Pose Estimation via Learning Orientations Only [116.1502793612437]
We propose a novel Pose Orientation Net (PONet) that is able to robustly estimate 3D pose by learning orientations only.
PONet estimates the 3D orientation of these limbs by taking advantage of the local image evidence to recover the 3D pose.
We evaluate our method on multiple datasets, including Human3.6M, MPII, MPI-INF-3DHP, and 3DPW.
arXiv Detail & Related papers (2021-12-21T12:48:48Z) - Detailed 3D Human Body Reconstruction from Multi-view Images Combining
Voxel Super-Resolution and Learned Implicit Representation [12.459968574683625]
We propose a coarse-to-fine method to reconstruct a detailed 3D human body from multi-view images.
The coarse 3D models are estimated by learning an implicit representation based on multi-scale features.
The refined detailed 3D human body models can be produced by the voxel super-resolution which can preserve the details.
arXiv Detail & Related papers (2020-12-11T08:07:39Z) - 3D Human Shape and Pose from a Single Low-Resolution Image with
Self-Supervised Learning [105.49950571267715]
Existing deep learning methods for 3D human shape and pose estimation rely on relatively high-resolution input images.
We propose RSC-Net, which consists of a Resolution-aware network, a Self-supervision loss, and a Contrastive learning scheme.
We show that both these new training losses provide robustness when learning 3D shape and pose in a weakly-supervised manner.
arXiv Detail & Related papers (2020-07-27T16:19:52Z) - Lightweight Multi-View 3D Pose Estimation through Camera-Disentangled
Representation [57.11299763566534]
We present a solution to recover 3D pose from multi-view images captured with spatially calibrated cameras.
We exploit 3D geometry to fuse input images into a unified latent representation of pose, which is disentangled from camera view-points.
Our architecture then conditions the learned representation on camera projection operators to produce accurate per-view 2d detections.
arXiv Detail & Related papers (2020-04-05T12:52:29Z) - Learning Pose-invariant 3D Object Reconstruction from Single-view Images [61.98279201609436]
In this paper, we explore a more realistic setup of learning 3D shape from only single-view images.
The major difficulty lies in insufficient constraints that can be provided by single view images.
We propose an effective adversarial domain confusion method to learn pose-disentangled compact shape space.
arXiv Detail & Related papers (2020-04-03T02:47:35Z) - Weakly-Supervised 3D Human Pose Learning via Multi-view Images in the
Wild [101.70320427145388]
We propose a weakly-supervised approach that does not require 3D annotations and learns to estimate 3D poses from unlabeled multi-view data.
We evaluate our proposed approach on two large scale datasets.
arXiv Detail & Related papers (2020-03-17T08:47:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.