Occlusion-Invariant Rotation-Equivariant Semi-Supervised Depth Based
Cross-View Gait Pose Estimation
- URL: http://arxiv.org/abs/2109.01397v1
- Date: Fri, 3 Sep 2021 09:39:05 GMT
- Title: Occlusion-Invariant Rotation-Equivariant Semi-Supervised Depth Based
Cross-View Gait Pose Estimation
- Authors: Xiao Gu, Jianxin Yang, Hanxiao Zhang, Jianing Qiu, Frank Po Wen Lo,
Yao Guo, Guang-Zhong Yang, Benny Lo
- Abstract summary: We propose a novel approach for cross-view generalization with an occlusion-invariant semi-supervised learning framework.
Our model was trained with real-world data from a single view and unlabelled synthetic data from multiple views.
It can generalize well on the real-world data from all the other unseen views.
- Score: 40.50555832966361
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Accurate estimation of three-dimensional human skeletons from depth images
can provide important metrics for healthcare applications, especially for
biomechanical gait analysis. However, there exist inherent problems associated
with depth images captured from a single view. The collected data is greatly
affected by occlusions where only partial surface data can be recorded.
Furthermore, depth images of human body exhibit heterogeneous characteristics
with viewpoint changes, and the estimated poses under local coordinate systems
are expected to go through equivariant rotations. Most existing pose estimation
models are sensitive to both issues. To address this, we propose a novel
approach for cross-view generalization with an occlusion-invariant
semi-supervised learning framework built upon a novel rotation-equivariant
backbone. Our model was trained with real-world data from a single view and
unlabelled synthetic data from multiple views. It can generalize well on the
real-world data from all the other unseen views. Our approach has shown
superior performance on gait analysis on our ICL-Gait dataset compared to other
state-of-the-arts and it can produce more convincing keypoints on ITOP dataset,
than its provided "ground truth".
Related papers
- GenDepth: Generalizing Monocular Depth Estimation for Arbitrary Camera
Parameters via Ground Plane Embedding [8.289857214449372]
GenDepth is a novel model capable of performing metric depth estimation for arbitrary vehicle-camera setups.
We propose a novel embedding of camera parameters as the ground plane depth and present a novel architecture that integrates these embeddings with adversarial domain alignment.
We validate GenDepth on several autonomous driving datasets, demonstrating its state-of-the-art generalization capability for different vehicle-camera systems.
arXiv Detail & Related papers (2023-12-10T22:28:34Z) - Weakly-supervised 3D Pose Transfer with Keypoints [57.66991032263699]
Main challenges of 3D pose transfer are: 1) Lack of paired training data with different characters performing the same pose; 2) Disentangling pose and shape information from the target mesh; 3) Difficulty in applying to meshes with different topologies.
We propose a novel weakly-supervised keypoint-based framework to overcome these difficulties.
arXiv Detail & Related papers (2023-07-25T12:40:24Z) - Improving 2D Human Pose Estimation in Rare Camera Views with Synthetic Data [24.63316659365843]
We introduce RePoGen, an SMPL-based method for generating synthetic humans with comprehensive control over pose and view.
Experiments on top-view datasets and a new dataset of real images with diverse poses show that adding the RePoGen data to the COCO dataset outperforms previous approaches.
arXiv Detail & Related papers (2023-07-13T13:17:50Z) - EllSeg-Gen, towards Domain Generalization for head-mounted eyetracking [19.913297057204357]
We show that convolutional networks excel at extracting gaze features despite the presence of such artifacts.
We compare the performance of a single model trained with multiple datasets against a pool of models trained on individual datasets.
Results indicate that models tested on datasets in which eye images exhibit higher appearance variability benefit from multiset training.
arXiv Detail & Related papers (2022-05-04T08:35:52Z) - Uncertainty-Aware Adaptation for Self-Supervised 3D Human Pose
Estimation [70.32536356351706]
We introduce MRP-Net that constitutes a common deep network backbone with two output heads subscribing to two diverse configurations.
We derive suitable measures to quantify prediction uncertainty at both pose and joint level.
We present a comprehensive evaluation of the proposed approach and demonstrate state-of-the-art performance on benchmark datasets.
arXiv Detail & Related papers (2022-03-29T07:14:58Z) - Unsupervised View-Invariant Human Posture Representation [28.840986167408037]
We present a novel unsupervised approach that learns to extract view-invariant 3D human pose representation from a 2D image.
Our model is trained by exploiting the intrinsic view-invariant properties of human pose between simultaneous frames.
We show improvements on the state-of-the-art unsupervised cross-view action classification accuracy on RGB and depth images.
arXiv Detail & Related papers (2021-09-17T19:23:31Z) - Salient Objects in Clutter [130.63976772770368]
This paper identifies and addresses a serious design bias of existing salient object detection (SOD) datasets.
This design bias has led to a saturation in performance for state-of-the-art SOD models when evaluated on existing datasets.
We propose a new high-quality dataset and update the previous saliency benchmark.
arXiv Detail & Related papers (2021-05-07T03:49:26Z) - AdaFuse: Adaptive Multiview Fusion for Accurate Human Pose Estimation in
the Wild [77.43884383743872]
We present AdaFuse, an adaptive multiview fusion method to enhance the features in occluded views.
We extensively evaluate the approach on three public datasets including Human3.6M, Total Capture and CMU Panoptic.
We also create a large scale synthetic dataset Occlusion-Person, which allows us to perform numerical evaluation on the occluded joints.
arXiv Detail & Related papers (2020-10-26T03:19:46Z) - Self-Supervised 3D Human Pose Estimation via Part Guided Novel Image
Synthesis [72.34794624243281]
We propose a self-supervised learning framework to disentangle variations from unlabeled video frames.
Our differentiable formalization, bridging the representation gap between the 3D pose and spatial part maps, allows us to operate on videos with diverse camera movements.
arXiv Detail & Related papers (2020-04-09T07:55:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.