Unsupervised 3D Keypoint Discovery with Multi-View Geometry
- URL: http://arxiv.org/abs/2211.12829v2
- Date: Thu, 8 Feb 2024 03:02:22 GMT
- Title: Unsupervised 3D Keypoint Discovery with Multi-View Geometry
- Authors: Sina Honari, Chen Zhao, Mathieu Salzmann, Pascal Fua
- Abstract summary: We propose an algorithm that learns to discover 3D keypoints on human bodies from multiple-view images without supervision or labels.
Our approach discovers more interpretable and accurate 3D keypoints compared to other state-of-the-art unsupervised approaches.
- Score: 104.76006413355485
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Analyzing and training 3D body posture models depend heavily on the
availability of joint labels that are commonly acquired through laborious
manual annotation of body joints or via marker-based joint localization using
carefully curated markers and capturing systems. However, such annotations are
not always available, especially for people performing unusual activities. In
this paper, we propose an algorithm that learns to discover 3D keypoints on
human bodies from multiple-view images without any supervision or labels other
than the constraints multiple-view geometry provides. To ensure that the
discovered 3D keypoints are meaningful, they are re-projected to each view to
estimate the person's mask that the model itself has initially estimated
without supervision. Our approach discovers more interpretable and accurate 3D
keypoints compared to other state-of-the-art unsupervised approaches on
Human3.6M and MPI-INF-3DHP benchmark datasets.
Related papers
- X as Supervision: Contending with Depth Ambiguity in Unsupervised Monocular 3D Pose Estimation [12.765995624408557]
We propose an unsupervised framework featuring a multi-hypothesis detector and multiple tailored pretext tasks.
The detector extracts multiple hypotheses from a heatmap within a local window, effectively managing the multi-solution problem.
The pretext tasks harness 3D human priors from the SMPL model to regularize the solution space of pose estimation, aligning it with the empirical distribution of 3D human structures.
arXiv Detail & Related papers (2024-11-20T04:18:11Z) - Geometry-Biased Transformer for Robust Multi-View 3D Human Pose
Reconstruction [3.069335774032178]
We propose a novel encoder-decoder Transformer architecture to estimate 3D poses from multi-view 2D pose sequences.
We conduct experiments on three benchmark public datasets, Human3.6M, CMU Panoptic and Occlusion-Persons.
arXiv Detail & Related papers (2023-12-28T16:30:05Z) - BKinD-3D: Self-Supervised 3D Keypoint Discovery from Multi-View Videos [38.16427363571254]
We propose a new method to perform self-supervised keypoint discovery in 3D from multi-view videos of behaving agents.
Our method, BKinD-3D, uses an encoder-decoder architecture with a 3D volumetric heatmap, trained to reconstruct differences across multiple views.
arXiv Detail & Related papers (2022-12-14T18:34:29Z) - On Triangulation as a Form of Self-Supervision for 3D Human Pose
Estimation [57.766049538913926]
Supervised approaches to 3D pose estimation from single images are remarkably effective when labeled data is abundant.
Much of the recent attention has shifted towards semi and (or) weakly supervised learning.
We propose to impose multi-view geometrical constraints by means of a differentiable triangulation and to use it as form of self-supervision during training when no labels are available.
arXiv Detail & Related papers (2022-03-29T19:11:54Z) - Learning Temporal 3D Human Pose Estimation with Pseudo-Labels [3.0954251281114513]
We present a simple, yet effective, approach for self-supervised 3D human pose estimation.
We rely on triangulating 2D body pose estimates of a multiple-view camera system.
Our method achieves state-of-the-art performance in the Human3.6M and MPI-INF-3DHP benchmarks.
arXiv Detail & Related papers (2021-10-14T17:40:45Z) - Encoder-decoder with Multi-level Attention for 3D Human Shape and Pose
Estimation [61.98690211671168]
We propose a Multi-level Attention-Decoder Network (MAED) to model multi-level attentions in a unified framework.
With the training set of 3DPW, MAED outperforms previous state-of-the-art methods by 6.2, 7.2, and 2.4 mm of PA-MPJPE.
arXiv Detail & Related papers (2021-09-06T09:06:17Z) - VoxelTrack: Multi-Person 3D Human Pose Estimation and Tracking in the
Wild [98.69191256693703]
We present VoxelTrack for multi-person 3D pose estimation and tracking from a few cameras which are separated by wide baselines.
It employs a multi-branch network to jointly estimate 3D poses and re-identification (Re-ID) features for all people in the environment.
It outperforms the state-of-the-art methods by a large margin on three public datasets including Shelf, Campus and CMU Panoptic.
arXiv Detail & Related papers (2021-08-05T08:35:44Z) - Unsupervised Learning of Visual 3D Keypoints for Control [104.92063943162896]
Learning sensorimotor control policies from high-dimensional images crucially relies on the quality of the underlying visual representations.
We propose a framework to learn such a 3D geometric structure directly from images in an end-to-end unsupervised manner.
These discovered 3D keypoints tend to meaningfully capture robot joints as well as object movements in a consistent manner across both time and 3D space.
arXiv Detail & Related papers (2021-06-14T17:59:59Z) - KAMA: 3D Keypoint Aware Body Mesh Articulation [79.04090630502782]
We propose an analytical solution to articulate a parametric body model, SMPL, via a set of straightforward geometric transformations.
Our approach offers significantly better alignment to image content when compared to state-of-the-art approaches.
Results on the challenging 3DPW and Human3.6M demonstrate that our approach yields state-of-the-art body mesh fittings.
arXiv Detail & Related papers (2021-04-27T23:01:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.