Markerless Multi-view 3D Human Pose Estimation: a survey
- URL: http://arxiv.org/abs/2407.03817v2
- Date: Mon, 09 Jun 2025 22:57:32 GMT
- Title: Markerless Multi-view 3D Human Pose Estimation: a survey
- Authors: Ana Filipa Rodrigues Nogueira, Hélder P. Oliveira, Luís F. Teixeira,
- Abstract summary: 3D human pose estimation involves reconstructing the human skeleton by detecting the body joints.<n> Accurate and efficient solutions are required for several real-world applications including animation, human-robot interaction, surveillance, and sports.<n>However, challenges such as occlusions, 2D pose mismatches, random camera perspectives, and limited 3D labelled data have been hampering the models' performance.
- Score: 0.49157446832511503
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: 3D human pose estimation involves reconstructing the human skeleton by detecting the body joints. Accurate and efficient solutions are required for several real-world applications including animation, human-robot interaction, surveillance, and sports. However, challenges such as occlusions, 2D pose mismatches, random camera perspectives, and limited 3D labelled data have been hampering the models' performance and limiting their deployment in real-world scenarios. The higher availability of cameras has led researchers to explore multi-view solutions to take advantage of the different perspectives to reconstruct the pose. Most existing reviews have mainly focused on monocular 3D human pose estimation, so a comprehensive survey on multi-view approaches has been missing since 2012. According to the reviewed articles, the majority of the existing methods are fully-supervised approaches based on geometric constraints, which are often limited by 2D pose mismatches. To mitigate this, researchers have proposed incorporating temporal consistency or depth information. Alternatively, working directly with 3D features has been shown to completely overcome this issue, albeit at the cost of increased computational complexity. Additionally, models with lower levels of supervision have been identified to help address challenges such as annotated data scarcity and generalisation to new setups. Therefore, no method currently addresses all challenges associated with 3D pose reconstruction, and a trade-off between complexity and performance exists. Further research is needed to develop approaches capable of quickly inferring a highly accurate 3D pose with bearable computation cost. Techniques such as active learning, low-supervision methods, temporal consistency, view selection, depth information estimation, and multi-modal approaches are strategies to consider when developing a new method for this task.
Related papers
- DeProPose: Deficiency-Proof 3D Human Pose Estimation via Adaptive Multi-View Fusion [57.83515140886807]
We introduce the task of Deficiency-Aware 3D Pose Estimation.
DeProPose is a flexible method that simplifies the network architecture to reduce training complexity.
We have developed a novel 3D human pose estimation dataset.
arXiv Detail & Related papers (2025-02-23T03:22:54Z) - Multi-view Pose Fusion for Occlusion-Aware 3D Human Pose Estimation [3.442372522693843]
We present a novel approach for robust 3D human pose estimation in the context of human-robot collaboration.
Our approach outperforms state-of-the-art multi-view human pose estimation techniques.
arXiv Detail & Related papers (2024-08-28T14:10:57Z) - Deep learning for 3D human pose estimation and mesh recovery: A survey [6.535833206786788]
We present a review of recent progress over the past five years in deep learning methods for 3D human pose estimation.
To the best of our knowledge, this survey is arguably the first to comprehensively cover deep learning methods for 3D human pose estimation.
arXiv Detail & Related papers (2024-02-29T04:30:39Z) - 3DHR-Co: A Collaborative Test-time Refinement Framework for In-the-Wild
3D Human-Body Reconstruction Task [63.85458454137262]
We propose a strategy that complements 3DHR test-time refinement work under a collaborative approach.
We show that our approach can significantly enhance the scores of common classic 3DHR backbones up to -34 mm pose error suppression.
arXiv Detail & Related papers (2023-10-02T15:46:25Z) - Markerless 3D human pose tracking through multiple cameras and AI:
Enabling high accuracy, robustness, and real-time performance [0.0]
Tracking 3D human motion in real-time is crucial for numerous applications across many fields.
Recent advances in Artificial Intelligence have allowed for markerless solutions.
We propose a markerless framework that combines multi-camera views and 2D AI-based pose estimation methods to track 3D human motion.
arXiv Detail & Related papers (2023-03-31T15:06:50Z) - DiffuPose: Monocular 3D Human Pose Estimation via Denoising Diffusion
Probabilistic Model [25.223801390996435]
This paper focuses on reconstructing a 3D pose from a single 2D keypoint detection.
We build a novel diffusion-based framework to effectively sample diverse 3D poses from an off-the-shelf 2D detector.
We evaluate our method on the widely adopted Human3.6M and HumanEva-I datasets.
arXiv Detail & Related papers (2022-12-06T07:22:20Z) - Towards unconstrained joint hand-object reconstruction from RGB videos [81.97694449736414]
Reconstructing hand-object manipulations holds a great potential for robotics and learning from human demonstrations.
We first propose a learning-free fitting approach for hand-object reconstruction which can seamlessly handle two-hand object interactions.
arXiv Detail & Related papers (2021-08-16T12:26:34Z) - Recent Advances in Monocular 2D and 3D Human Pose Estimation: A Deep
Learning Perspective [69.44384540002358]
We provide a comprehensive and holistic 2D-to-3D perspective to tackle this problem.
We categorize the mainstream and milestone approaches since the year 2014 under unified frameworks.
We also summarize the pose representation styles, benchmarks, evaluation metrics, and the quantitative performance of popular approaches.
arXiv Detail & Related papers (2021-04-23T11:07:07Z) - 3D Human Pose, Shape and Texture from Low-Resolution Images and Videos [107.36352212367179]
We propose RSC-Net, which consists of a Resolution-aware network, a Self-supervision loss, and a Contrastive learning scheme.
The proposed method is able to learn 3D body pose and shape across different resolutions with one single model.
We extend the RSC-Net to handle low-resolution videos and apply it to reconstruct textured 3D pedestrians from low-resolution input.
arXiv Detail & Related papers (2021-03-11T06:52:12Z) - HMOR: Hierarchical Multi-Person Ordinal Relations for Monocular
Multi-Person 3D Pose Estimation [54.23770284299979]
This paper introduces a novel form of supervision - Hierarchical Multi-person Ordinal Relations (HMOR)
HMOR encodes interaction information as the ordinal relations of depths and angles hierarchically.
An integrated top-down model is designed to leverage these ordinal relations in the learning process.
The proposed method significantly outperforms state-of-the-art methods on publicly available multi-person 3D pose datasets.
arXiv Detail & Related papers (2020-08-01T07:53:27Z) - 3D Human Shape and Pose from a Single Low-Resolution Image with
Self-Supervised Learning [105.49950571267715]
Existing deep learning methods for 3D human shape and pose estimation rely on relatively high-resolution input images.
We propose RSC-Net, which consists of a Resolution-aware network, a Self-supervision loss, and a Contrastive learning scheme.
We show that both these new training losses provide robustness when learning 3D shape and pose in a weakly-supervised manner.
arXiv Detail & Related papers (2020-07-27T16:19:52Z) - Kinematic-Structure-Preserved Representation for Unsupervised 3D Human
Pose Estimation [58.72192168935338]
Generalizability of human pose estimation models developed using supervision on large-scale in-studio datasets remains questionable.
We propose a novel kinematic-structure-preserved unsupervised 3D pose estimation framework, which is not restrained by any paired or unpaired weak supervisions.
Our proposed model employs three consecutive differentiable transformations named as forward-kinematics, camera-projection and spatial-map transformation.
arXiv Detail & Related papers (2020-06-24T23:56:33Z) - Weakly-Supervised 3D Human Pose Learning via Multi-view Images in the
Wild [101.70320427145388]
We propose a weakly-supervised approach that does not require 3D annotations and learns to estimate 3D poses from unlabeled multi-view data.
We evaluate our proposed approach on two large scale datasets.
arXiv Detail & Related papers (2020-03-17T08:47:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.