Improving 2D Human Pose Estimation in Rare Camera Views with Synthetic Data
- URL: http://arxiv.org/abs/2307.06737v2
- Date: Sat, 20 Apr 2024 11:53:13 GMT
- Title: Improving 2D Human Pose Estimation in Rare Camera Views with Synthetic Data
- Authors: Miroslav Purkrabek, Jiri Matas,
- Abstract summary: We introduce RePoGen, an SMPL-based method for generating synthetic humans with comprehensive control over pose and view.
Experiments on top-view datasets and a new dataset of real images with diverse poses show that adding the RePoGen data to the COCO dataset outperforms previous approaches.
- Score: 24.63316659365843
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Methods and datasets for human pose estimation focus predominantly on side- and front-view scenarios. We overcome the limitation by leveraging synthetic data and introduce RePoGen (RarE POses GENerator), an SMPL-based method for generating synthetic humans with comprehensive control over pose and view. Experiments on top-view datasets and a new dataset of real images with diverse poses show that adding the RePoGen data to the COCO dataset outperforms previous approaches to top- and bottom-view pose estimation without harming performance on common views. An ablation study shows that anatomical plausibility, a property prior research focused on, is not a prerequisite for effective performance. The introduced dataset and the corresponding code are available on https://mirapurkrabek.github.io/RePoGen-paper/ .
Related papers
- Generalizable Human Gaussians from Single-View Image [52.100234836129786]
We introduce a single-view generalizable Human Gaussian Model (HGM)
Our approach uses a ControlNet to refine rendered back-view images from coarse predicted human Gaussians.
To mitigate the potential generation of unrealistic human poses and shapes, we incorporate human priors from the SMPL-X model as a dual branch.
arXiv Detail & Related papers (2024-06-10T06:38:11Z) - WheelPose: Data Synthesis Techniques to Improve Pose Estimation Performance on Wheelchair Users [5.057643544417776]
Existing pose estimation models perform poorly on wheelchair users due to a lack of representation in training data.
We present a data synthesis pipeline to address this disparity in data collection.
Our pipeline generates synthetic data of wheelchair users using motion capture data and motion generation outputs simulated in the Unity game engine.
arXiv Detail & Related papers (2024-04-25T22:17:32Z) - Deep Domain Adaptation: A Sim2Real Neural Approach for Improving Eye-Tracking Systems [80.62854148838359]
Eye image segmentation is a critical step in eye tracking that has great influence over the final gaze estimate.
We use dimensionality-reduction techniques to measure the overlap between the target eye images and synthetic training data.
Our methods result in robust, improved performance when tackling the discrepancy between simulation and real-world data samples.
arXiv Detail & Related papers (2024-03-23T22:32:06Z) - LiCamPose: Combining Multi-View LiDAR and RGB Cameras for Robust Single-frame 3D Human Pose Estimation [31.651300414497822]
LiCamPose is a pipeline that integrates multi-view RGB and sparse point cloud information to estimate robust 3D human poses via single frame.
LiCamPose is evaluated on four datasets, including two public datasets, one synthetic dataset, and one challenging self-collected dataset.
arXiv Detail & Related papers (2023-12-11T14:30:11Z) - Learning Human Action Recognition Representations Without Real Humans [66.61527869763819]
We present a benchmark that leverages real-world videos with humans removed and synthetic data containing virtual humans to pre-train a model.
We then evaluate the transferability of the representation learned on this data to a diverse set of downstream action recognition benchmarks.
Our approach outperforms previous baselines by up to 5%.
arXiv Detail & Related papers (2023-11-10T18:38:14Z) - Domain-Adaptive Full-Face Gaze Estimation via Novel-View-Synthesis and Feature Disentanglement [12.857137513211866]
We propose an effective model training pipeline consisting of a training data synthesis and a gaze estimation model for unsupervised domain adaptation.
The proposed data synthesis leverages the single-image 3D reconstruction to expand the range of the head poses from the source domain without requiring a 3D facial shape dataset.
We propose a disentangling autoencoder network to separate gaze-related features and introduce background augmentation consistency loss to utilize the characteristics of the synthetic source domain.
arXiv Detail & Related papers (2023-05-25T15:15:03Z) - Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based
Action Recognition [88.34182299496074]
Action labels are only available on a source dataset, but unavailable on a target dataset in the training stage.
We utilize a self-supervision scheme to reduce the domain shift between two skeleton-based action datasets.
By segmenting and permuting temporal segments or human body parts, we design two self-supervised learning classification tasks.
arXiv Detail & Related papers (2022-07-17T07:05:39Z) - Delving into High-Quality Synthetic Face Occlusion Segmentation Datasets [83.749895930242]
We propose two techniques for producing high-quality naturalistic synthetic occluded faces.
We empirically show the effectiveness and robustness of both methods, even for unseen occlusions.
We present two high-resolution real-world occluded face datasets with fine-grained annotations, RealOcc and RealOcc-Wild.
arXiv Detail & Related papers (2022-05-12T17:03:57Z) - Learning-by-Novel-View-Synthesis for Full-Face Appearance-based 3D Gaze
Estimation [8.929311633814411]
This work examines a novel approach for synthesizing gaze estimation training data based on monocular 3D face reconstruction.
Unlike prior works using multi-view reconstruction, photo-realistic CG models, or generative neural networks, our approach can manipulate and extend the head pose range of existing training data.
arXiv Detail & Related papers (2022-01-20T00:29:45Z) - Occlusion-Invariant Rotation-Equivariant Semi-Supervised Depth Based
Cross-View Gait Pose Estimation [40.50555832966361]
We propose a novel approach for cross-view generalization with an occlusion-invariant semi-supervised learning framework.
Our model was trained with real-world data from a single view and unlabelled synthetic data from multiple views.
It can generalize well on the real-world data from all the other unseen views.
arXiv Detail & Related papers (2021-09-03T09:39:05Z) - AdaFuse: Adaptive Multiview Fusion for Accurate Human Pose Estimation in
the Wild [77.43884383743872]
We present AdaFuse, an adaptive multiview fusion method to enhance the features in occluded views.
We extensively evaluate the approach on three public datasets including Human3.6M, Total Capture and CMU Panoptic.
We also create a large scale synthetic dataset Occlusion-Person, which allows us to perform numerical evaluation on the occluded joints.
arXiv Detail & Related papers (2020-10-26T03:19:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.