CLERF: Contrastive LEaRning for Full Range Head Pose Estimation
- URL: http://arxiv.org/abs/2412.02066v1
- Date: Tue, 03 Dec 2024 01:08:03 GMT
- Title: CLERF: Contrastive LEaRning for Full Range Head Pose Estimation
- Authors: Ting-Ruen Wei, Haowei Liu, Huei-Chung Hu, Xuyang Wu, Yi Fang, Hsin-Tai Wu,
- Abstract summary: We introduce a novel framework for representation learning in head pose estimation (HPE)
Recent progress in 3D generative adversarial networks (3D-aware GAN) has opened the door for easily sampling triplets (anchor, positive, negative)
- Score: 8.938918988246128
- License:
- Abstract: We introduce a novel framework for representation learning in head pose estimation (HPE). Previously such a scheme was difficult due to head pose data sparsity, making triplet sampling infeasible. Recent progress in 3D generative adversarial networks (3D-aware GAN) has opened the door for easily sampling triplets (anchor, positive, negative). We perform contrastive learning on extensively augmented data including geometric transformations and demonstrate that contrastive learning allows networks to learn genuine features that contribute to accurate HPE. On the other hand, we observe that existing HPE works struggle to predict head poses as accurately when test image rotation matrices are slightly out of the training dataset distribution. Experiments show that our methodology performs on par with state-of-the-art models on standard test datasets and outperforms them when images are slightly rotated/ flipped or full range head pose. To the best of our knowledge, we are the first to deliver a true full range HPE model capable of accurately predicting any head pose including upside-down pose. Furthermore, we compared with other existing full-yaw range models and demonstrated superior results.
Related papers
- UPose3D: Uncertainty-Aware 3D Human Pose Estimation with Cross-View and Temporal Cues [55.69339788566899]
UPose3D is a novel approach for multi-view 3D human pose estimation.
It improves robustness and flexibility without requiring direct 3D annotations.
arXiv Detail & Related papers (2024-04-23T00:18:00Z) - Learning 3D-Aware GANs from Unposed Images with Template Feature Field [33.32761749864555]
This work targets learning 3D-aware GANs from unposed images.
We propose to perform on-the-fly pose estimation of training images with a learned template feature field (TeFF)
arXiv Detail & Related papers (2024-04-08T17:42:08Z) - Towards Robust 3D Pose Transfer with Adversarial Learning [36.351835328908116]
3D pose transfer that aims to transfer the desired pose to a target mesh is one of the most challenging 3D generation tasks.
Previous attempts rely on well-defined parametric human models or skeletal joints as driving pose sources.
We propose 3D-PoseMAE, a customized MAE that effectively learns 3D extrinsic presentations (i.e., pose)
arXiv Detail & Related papers (2024-04-02T19:03:39Z) - Towards Unseen Triples: Effective Text-Image-joint Learning for Scene
Graph Generation [30.79358827005448]
Scene Graph Generation (SGG) aims to structurally and comprehensively represent objects and their connections in images.
Existing SGG models often struggle to solve the long-tailed problem caused by biased datasets.
We propose a Text-Image-joint Scene Graph Generation (TISGG) model to resolve the unseen triples and improve the generalisation capability of the SGG models.
arXiv Detail & Related papers (2023-06-23T10:17:56Z) - Instant Multi-View Head Capture through Learnable Registration [62.70443641907766]
Existing methods for capturing datasets of 3D heads in dense semantic correspondence are slow.
We introduce TEMPEH to directly infer 3D heads in dense correspondence from calibrated multi-view images.
Predicting one head takes about 0.3 seconds with a median reconstruction error of 0.26 mm, 64% lower than the current state-of-the-art.
arXiv Detail & Related papers (2023-06-12T21:45:18Z) - Learning 3D-aware Image Synthesis with Unknown Pose Distribution [68.62476998646866]
Existing methods for 3D-aware image synthesis largely depend on the 3D pose distribution pre-estimated on the training set.
This work proposes PoF3D that frees generative radiance fields from the requirements of 3D pose priors.
arXiv Detail & Related papers (2023-01-18T18:47:46Z) - Leveraging Equivariant Features for Absolute Pose Regression [9.30597356471664]
We show that a translation and rotation equivariant Convolutional Neural Network directly induces representations of camera motions into the feature space.
We then show that this geometric property allows for implicitly augmenting the training data under a whole group of image plane-preserving transformations.
arXiv Detail & Related papers (2022-04-05T12:44:20Z) - PoseTriplet: Co-evolving 3D Human Pose Estimation, Imitation, and
Hallucination under Self-supervision [102.48681650013698]
Existing self-supervised 3D human pose estimation schemes have largely relied on weak supervisions to guide the learning.
We propose a novel self-supervised approach that allows us to explicitly generate 2D-3D pose pairs for augmenting supervision.
This is made possible via introducing a reinforcement-learning-based imitator, which is learned jointly with a pose estimator alongside a pose hallucinator.
arXiv Detail & Related papers (2022-03-29T14:45:53Z) - Self-Supervised 3D Hand Pose Estimation from monocular RGB via
Contrastive Learning [50.007445752513625]
We propose a new self-supervised method for the structured regression task of 3D hand pose estimation.
We experimentally investigate the impact of invariant and equivariant contrastive objectives.
We show that a standard ResNet-152, trained on additional unlabeled data, attains an improvement of $7.6%$ in PA-EPE on FreiHAND.
arXiv Detail & Related papers (2021-06-10T17:48:57Z) - Self-Supervised 3D Human Pose Estimation via Part Guided Novel Image
Synthesis [72.34794624243281]
We propose a self-supervised learning framework to disentangle variations from unlabeled video frames.
Our differentiable formalization, bridging the representation gap between the 3D pose and spatial part maps, allows us to operate on videos with diverse camera movements.
arXiv Detail & Related papers (2020-04-09T07:55:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.