Predicting Camera Viewpoint Improves Cross-dataset Generalization for 3D
Human Pose Estimation
- URL: http://arxiv.org/abs/2004.03143v1
- Date: Tue, 7 Apr 2020 06:06:20 GMT
- Title: Predicting Camera Viewpoint Improves Cross-dataset Generalization for 3D
Human Pose Estimation
- Authors: Zhe Wang, Daeyun Shin, Charless C. Fowlkes
- Abstract summary: We study the diversity and biases present in specific datasets and its effect on cross-dataset generalization.
We find that models trained to jointly predict viewpoint and pose systematically show significantly improved cross-dataset generalization.
- Score: 32.6329300863371
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Monocular estimation of 3d human pose has attracted increased attention with
the availability of large ground-truth motion capture datasets. However, the
diversity of training data available is limited and it is not clear to what
extent methods generalize outside the specific datasets they are trained on. In
this work we carry out a systematic study of the diversity and biases present
in specific datasets and its effect on cross-dataset generalization across a
compendium of 5 pose datasets. We specifically focus on systematic differences
in the distribution of camera viewpoints relative to a body-centered coordinate
frame. Based on this observation, we propose an auxiliary task of predicting
the camera viewpoint in addition to pose. We find that models trained to
jointly predict viewpoint and pose systematically show significantly improved
cross-dataset generalization.
Related papers
- SCENES: Subpixel Correspondence Estimation With Epipolar Supervision [18.648772607057175]
Extracting point correspondences from two or more views of a scene is a fundamental computer vision problem.
Existing local feature matching approaches, trained with correspondence supervision on large-scale datasets, obtain highly-accurate matches on the test sets.
We relax this assumption by removing the requirement of 3D structure, e.g., depth maps or point clouds, and only require camera pose information, which can be obtained from odometry.
arXiv Detail & Related papers (2024-01-19T18:57:46Z) - Weakly-supervised 3D Pose Transfer with Keypoints [57.66991032263699]
Main challenges of 3D pose transfer are: 1) Lack of paired training data with different characters performing the same pose; 2) Disentangling pose and shape information from the target mesh; 3) Difficulty in applying to meshes with different topologies.
We propose a novel weakly-supervised keypoint-based framework to overcome these difficulties.
arXiv Detail & Related papers (2023-07-25T12:40:24Z) - Estimating Egocentric 3D Human Pose in the Wild with External Weak
Supervision [72.36132924512299]
We present a new egocentric pose estimation method, which can be trained on a large-scale in-the-wild egocentric dataset.
We propose a novel learning strategy to supervise the egocentric features with the high-quality features extracted by a pretrained external-view pose estimation model.
Experiments show that our method predicts accurate 3D poses from a single in-the-wild egocentric image and outperforms the state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2022-01-20T00:45:13Z) - Enhancing Egocentric 3D Pose Estimation with Third Person Views [37.9683439632693]
We propose a novel approach to enhance the 3D body pose estimation of a person computed from videos captured from a single wearable camera.
We introduce First2Third-Pose, a new paired synchronized dataset of nearly 2,000 videos depicting human activities captured from both first- and third-view perspectives.
Experimental results demonstrate that the joint multi-view embedded space learned with our dataset is useful to extract discriminatory features from arbitrary single-view egocentric videos.
arXiv Detail & Related papers (2022-01-06T11:42:01Z) - Occlusion-Invariant Rotation-Equivariant Semi-Supervised Depth Based
Cross-View Gait Pose Estimation [40.50555832966361]
We propose a novel approach for cross-view generalization with an occlusion-invariant semi-supervised learning framework.
Our model was trained with real-world data from a single view and unlabelled synthetic data from multiple views.
It can generalize well on the real-world data from all the other unseen views.
arXiv Detail & Related papers (2021-09-03T09:39:05Z) - Learning Dynamics via Graph Neural Networks for Human Pose Estimation
and Tracking [98.91894395941766]
We propose a novel online approach to learning the pose dynamics, which are independent of pose detections in current fame.
Specifically, we derive this prediction of dynamics through a graph neural network(GNN) that explicitly accounts for both spatial-temporal and visual information.
Experiments on PoseTrack 2017 and PoseTrack 2018 datasets demonstrate that the proposed method achieves results superior to the state of the art on both human pose estimation and tracking tasks.
arXiv Detail & Related papers (2021-06-07T16:36:50Z) - Self-supervised Human Detection and Segmentation via Multi-view
Consensus [116.92405645348185]
We propose a multi-camera framework in which geometric constraints are embedded in the form of multi-view consistency during training.
We show that our approach outperforms state-of-the-art self-supervised person detection and segmentation techniques on images that visually depart from those of standard benchmarks.
arXiv Detail & Related papers (2020-12-09T15:47:21Z) - Self-Supervised Multi-View Synchronization Learning for 3D Pose
Estimation [39.334995719523]
Current methods cast monocular 3D human pose estimation as a learning problem by training neural networks on large data sets of images and corresponding skeleton poses.
We propose an approach that can exploit small annotated data sets by fine-tuning networks pre-trained via self-supervised learning on (large) unlabeled data sets.
We demonstrate the effectiveness of the synchronization task on the Human3.6M data set and achieve state-of-the-art results in 3D human pose estimation.
arXiv Detail & Related papers (2020-10-13T08:01:24Z) - Kinematic-Structure-Preserved Representation for Unsupervised 3D Human
Pose Estimation [58.72192168935338]
Generalizability of human pose estimation models developed using supervision on large-scale in-studio datasets remains questionable.
We propose a novel kinematic-structure-preserved unsupervised 3D pose estimation framework, which is not restrained by any paired or unpaired weak supervisions.
Our proposed model employs three consecutive differentiable transformations named as forward-kinematics, camera-projection and spatial-map transformation.
arXiv Detail & Related papers (2020-06-24T23:56:33Z) - Self-Supervised 3D Human Pose Estimation via Part Guided Novel Image
Synthesis [72.34794624243281]
We propose a self-supervised learning framework to disentangle variations from unlabeled video frames.
Our differentiable formalization, bridging the representation gap between the 3D pose and spatial part maps, allows us to operate on videos with diverse camera movements.
arXiv Detail & Related papers (2020-04-09T07:55:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.