Markerless 3D human pose tracking through multiple cameras and AI:
Enabling high accuracy, robustness, and real-time performance
- URL: http://arxiv.org/abs/2303.18119v1
- Date: Fri, 31 Mar 2023 15:06:50 GMT
- Title: Markerless 3D human pose tracking through multiple cameras and AI:
Enabling high accuracy, robustness, and real-time performance
- Authors: Luca Fortini (1,2), Mattia Leonori (1), Juan M. Gandarias (1), Elena
de Momi (2), Arash Ajoudani (1) ((1) Human-Robot Interfaces and Interaction,
Istituto Italiano di Tecnologia, Genoa, Italy (2) Department of Electronics,
Information and Bioengineering, Politecnico di Milano, Milan, Italy)
- Abstract summary: Tracking 3D human motion in real-time is crucial for numerous applications across many fields.
Recent advances in Artificial Intelligence have allowed for markerless solutions.
We propose a markerless framework that combines multi-camera views and 2D AI-based pose estimation methods to track 3D human motion.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Tracking 3D human motion in real-time is crucial for numerous applications
across many fields. Traditional approaches involve attaching artificial
fiducial objects or sensors to the body, limiting their usability and
comfort-of-use and consequently narrowing their application fields. Recent
advances in Artificial Intelligence (AI) have allowed for markerless solutions.
However, most of these methods operate in 2D, while those providing 3D
solutions compromise accuracy and real-time performance. To address this
challenge and unlock the potential of visual pose estimation methods in
real-world scenarios, we propose a markerless framework that combines
multi-camera views and 2D AI-based pose estimation methods to track 3D human
motion. Our approach integrates a Weighted Least Square (WLS) algorithm that
computes 3D human motion from multiple 2D pose estimations provided by an
AI-driven method. The method is integrated within the Open-VICO framework
allowing simulation and real-world execution. Several experiments have been
conducted, which have shown high accuracy and real-time performance,
demonstrating the high level of readiness for real-world applications and the
potential to revolutionize human motion capture.
Related papers
- Markerless Multi-view 3D Human Pose Estimation: a survey [0.49157446832511503]
3D human pose estimation aims to reconstruct the human skeleton of all the individuals in a scene by detecting several body joints.
No method is yet capable of solving all the challenges associated with the reconstruction of the 3D pose.
Further research is still required to develop an approach capable of quickly inferring a highly accurate 3D pose with bearable computation cost.
arXiv Detail & Related papers (2024-07-04T10:44:35Z) - UPose3D: Uncertainty-Aware 3D Human Pose Estimation with Cross-View and Temporal Cues [55.69339788566899]
UPose3D is a novel approach for multi-view 3D human pose estimation.
It improves robustness and flexibility without requiring direct 3D annotations.
arXiv Detail & Related papers (2024-04-23T00:18:00Z) - SpatialTracker: Tracking Any 2D Pixels in 3D Space [71.58016288648447]
We propose to estimate point trajectories in 3D space to mitigate the issues caused by image projection.
Our method, named SpatialTracker, lifts 2D pixels to 3D using monocular depth estimators.
Tracking in 3D allows us to leverage as-rigid-as-possible (ARAP) constraints while simultaneously learning a rigidity embedding that clusters pixels into different rigid parts.
arXiv Detail & Related papers (2024-04-05T17:59:25Z) - DO3D: Self-supervised Learning of Decomposed Object-aware 3D Motion and
Depth from Monocular Videos [76.01906393673897]
We propose a self-supervised method to jointly learn 3D motion and depth from monocular videos.
Our system contains a depth estimation module to predict depth, and a new decomposed object-wise 3D motion (DO3D) estimation module to predict ego-motion and 3D object motion.
Our model delivers superior performance in all evaluated settings.
arXiv Detail & Related papers (2024-03-09T12:22:46Z) - WHAM: Reconstructing World-grounded Humans with Accurate 3D Motion [43.95997922499137]
WHAM (World-grounded Humans with Accurate Motion) reconstructs 3D human motion in a global coordinate system from video.
Uses camera angular velocity estimated from a SLAM method together with human motion to estimate the body's global trajectory.
outperforms all existing 3D human motion recovery methods across multiple in-the-wild benchmarks.
arXiv Detail & Related papers (2023-12-12T18:57:46Z) - HUM3DIL: Semi-supervised Multi-modal 3D Human Pose Estimation for
Autonomous Driving [95.42203932627102]
3D human pose estimation is an emerging technology, which can enable the autonomous vehicle to perceive and understand the subtle and complex behaviors of pedestrians.
Our method efficiently makes use of these complementary signals, in a semi-supervised fashion and outperforms existing methods with a large margin.
Specifically, we embed LiDAR points into pixel-aligned multi-modal features, which we pass through a sequence of Transformer refinement stages.
arXiv Detail & Related papers (2022-12-15T11:15:14Z) - DiffuPose: Monocular 3D Human Pose Estimation via Denoising Diffusion
Probabilistic Model [25.223801390996435]
This paper focuses on reconstructing a 3D pose from a single 2D keypoint detection.
We build a novel diffusion-based framework to effectively sample diverse 3D poses from an off-the-shelf 2D detector.
We evaluate our method on the widely adopted Human3.6M and HumanEva-I datasets.
arXiv Detail & Related papers (2022-12-06T07:22:20Z) - HULC: 3D Human Motion Capture with Pose Manifold Sampling and Dense
Contact Guidance [82.09463058198546]
Marker-less monocular 3D human motion capture (MoCap) with scene interactions is a challenging research topic relevant for extended reality, robotics and virtual avatar generation.
We propose HULC, a new approach for 3D human MoCap which is aware of the scene geometry.
arXiv Detail & Related papers (2022-05-11T17:59:31Z) - Seeing by haptic glance: reinforcement learning-based 3D object
Recognition [31.80213713136647]
Human is able to conduct 3D recognition by a limited number of haptic contacts between the target object and his/her fingers without seeing the object.
This capability is defined as haptic glance' in cognitive neuroscience.
Most of the existing 3D recognition models were developed based on dense 3D data.
In many real-life use cases, where robots are used to collect 3D data by haptic exploration, only a limited number of 3D points could be collected.
A novel reinforcement learning based framework is proposed, where the haptic exploration procedure is optimized simultaneously with the objective 3D recognition with actively collected 3D
arXiv Detail & Related papers (2021-02-15T15:38:22Z) - PLUME: Efficient 3D Object Detection from Stereo Images [95.31278688164646]
Existing methods tackle the problem in two steps: first depth estimation is performed, a pseudo LiDAR point cloud representation is computed from the depth estimates, and then object detection is performed in 3D space.
We propose a model that unifies these two tasks in the same metric space.
Our approach achieves state-of-the-art performance on the challenging KITTI benchmark, with significantly reduced inference time compared with existing methods.
arXiv Detail & Related papers (2021-01-17T05:11:38Z) - Exploring Severe Occlusion: Multi-Person 3D Pose Estimation with Gated
Convolution [34.301501457959056]
We propose a temporal regression network with a gated convolution module to transform 2D joints to 3D.
A simple yet effective localization approach is also conducted to transform the normalized pose to the global trajectory.
Our proposed method outperforms most state-of-the-art 2D-to-3D pose estimation methods.
arXiv Detail & Related papers (2020-10-31T04:35:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.