Related papers: Markerless 3D human pose tracking through multiple cameras and AI: Enabling high accuracy, robustness, and real-time performance

Markerless 3D human pose tracking through multiple cameras and AI: Enabling high accuracy, robustness, and real-time performance

URL: http://arxiv.org/abs/2303.18119v1
Date: Fri, 31 Mar 2023 15:06:50 GMT
Title: Markerless 3D human pose tracking through multiple cameras and AI: Enabling high accuracy, robustness, and real-time performance
Authors: Luca Fortini (1,2), Mattia Leonori (1), Juan M. Gandarias (1), Elena de Momi (2), Arash Ajoudani (1) ((1) Human-Robot Interfaces and Interaction, Istituto Italiano di Tecnologia, Genoa, Italy (2) Department of Electronics, Information and Bioengineering, Politecnico di Milano, Milan, Italy)
Abstract summary: Tracking 3D human motion in real-time is crucial for numerous applications across many fields. Recent advances in Artificial Intelligence have allowed for markerless solutions. We propose a markerless framework that combines multi-camera views and 2D AI-based pose estimation methods to track 3D human motion.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Tracking 3D human motion in real-time is crucial for numerous applications across many fields. Traditional approaches involve attaching artificial fiducial objects or sensors to the body, limiting their usability and comfort-of-use and consequently narrowing their application fields. Recent advances in Artificial Intelligence (AI) have allowed for markerless solutions. However, most of these methods operate in 2D, while those providing 3D solutions compromise accuracy and real-time performance. To address this challenge and unlock the potential of visual pose estimation methods in real-world scenarios, we propose a markerless framework that combines multi-camera views and 2D AI-based pose estimation methods to track 3D human motion. Our approach integrates a Weighted Least Square (WLS) algorithm that computes 3D human motion from multiple 2D pose estimations provided by an AI-driven method. The method is integrated within the Open-VICO framework allowing simulation and real-world execution. Several experiments have been conducted, which have shown high accuracy and real-time performance, demonstrating the high level of readiness for real-world applications and the potential to revolutionize human motion capture.

Related papers

Zero-Shot Human-Object Interaction Synthesis with Multimodal Priors [31.277540988829976]
This paper proposes a novel zero-shot HOI synthesis framework without relying on end-to-end training on currently limited 3D HOI datasets. We employ pre-trained human pose estimation models to extract human poses and introduce a generalizable category-level 6-DoF estimation method to obtain the object poses from 2D HOI images.
arXiv Detail & Related papers (2025-03-25T23:55:47Z)
DeProPose: Deficiency-Proof 3D Human Pose Estimation via Adaptive Multi-View Fusion [57.83515140886807]
We introduce the task of Deficiency-Aware 3D Pose Estimation. DeProPose is a flexible method that simplifies the network architecture to reduce training complexity. We have developed a novel 3D human pose estimation dataset.
arXiv Detail & Related papers (2025-02-23T03:22:54Z)
Street Gaussians without 3D Object Tracker [86.62329193275916]
Existing methods rely on labor-intensive manual labeling of object poses to reconstruct dynamic objects in canonical space. We propose a stable object tracking module by leveraging associations from 2D deep trackers within a 3D object fusion strategy. We address inevitable tracking errors by further introducing a motion learning strategy in an implicit feature space that autonomously corrects trajectory errors and recovers missed detections.
arXiv Detail & Related papers (2024-12-07T05:49:42Z)
Bundle Adjusted Gaussian Avatars Deblurring [31.718130377229482]
We propose a 3D-aware, physics-oriented model of blur formation attributable to human movement and a 3D human motion model to clarify ambiguities found in motion-induced blurry images. We have established benchmarks for this task through a synthetic dataset derived from existing multi-view captures, alongside a real-captured dataset acquired through a 360-degree synchronous hybrid-exposure camera system.
arXiv Detail & Related papers (2024-11-24T10:03:24Z)
DELTA: Dense Efficient Long-range 3D Tracking for any video [82.26753323263009]
We introduce DELTA, a novel method that efficiently tracks every pixel in 3D space, enabling accurate motion estimation across entire videos. Our approach leverages a joint global-local attention mechanism for reduced-resolution tracking, followed by a transformer-based upsampler to achieve high-resolution predictions. Our method provides a robust solution for applications requiring fine-grained, long-term motion tracking in 3D space.
arXiv Detail & Related papers (2024-10-31T17:59:01Z)
LLMI3D: Empowering LLM with 3D Perception from a Single 2D Image [72.14973729674995]
Current 3D perception methods, particularly small models, struggle with processing logical reasoning, question-answering, and handling open scenario categories. We propose solutions: Spatial-Enhanced Local Feature Mining for better spatial feature extraction, 3D Query Token-Derived Info Decoding for precise geometric regression, and Geometry Projection-Based 3D Reasoning for handling camera focal length variations.
arXiv Detail & Related papers (2024-08-14T10:00:16Z)
Markerless Multi-view 3D Human Pose Estimation: a survey [0.49157446832511503]
3D human pose estimation aims to reconstruct the human skeleton of all the individuals in a scene by detecting several body joints. No method is yet capable of solving all the challenges associated with the reconstruction of the 3D pose. Further research is still required to develop an approach capable of quickly inferring a highly accurate 3D pose with bearable computation cost.
arXiv Detail & Related papers (2024-07-04T10:44:35Z)
UPose3D: Uncertainty-Aware 3D Human Pose Estimation with Cross-View and Temporal Cues [55.69339788566899]
UPose3D is a novel approach for multi-view 3D human pose estimation. It improves robustness and flexibility without requiring direct 3D annotations.
arXiv Detail & Related papers (2024-04-23T00:18:00Z)
DO3D: Self-supervised Learning of Decomposed Object-aware 3D Motion and Depth from Monocular Videos [76.01906393673897]
We propose a self-supervised method to jointly learn 3D motion and depth from monocular videos. Our system contains a depth estimation module to predict depth, and a new decomposed object-wise 3D motion (DO3D) estimation module to predict ego-motion and 3D object motion. Our model delivers superior performance in all evaluated settings.
arXiv Detail & Related papers (2024-03-09T12:22:46Z)
WHAM: Reconstructing World-grounded Humans with Accurate 3D Motion [43.95997922499137]
WHAM (World-grounded Humans with Accurate Motion) reconstructs 3D human motion in a global coordinate system from video. Uses camera angular velocity estimated from a SLAM method together with human motion to estimate the body's global trajectory. outperforms all existing 3D human motion recovery methods across multiple in-the-wild benchmarks.
arXiv Detail & Related papers (2023-12-12T18:57:46Z)
DiffuPose: Monocular 3D Human Pose Estimation via Denoising Diffusion Probabilistic Model [25.223801390996435]
This paper focuses on reconstructing a 3D pose from a single 2D keypoint detection. We build a novel diffusion-based framework to effectively sample diverse 3D poses from an off-the-shelf 2D detector. We evaluate our method on the widely adopted Human3.6M and HumanEva-I datasets.
arXiv Detail & Related papers (2022-12-06T07:22:20Z)
HULC: 3D Human Motion Capture with Pose Manifold Sampling and Dense Contact Guidance [82.09463058198546]
Marker-less monocular 3D human motion capture (MoCap) with scene interactions is a challenging research topic relevant for extended reality, robotics and virtual avatar generation. We propose HULC, a new approach for 3D human MoCap which is aware of the scene geometry.
arXiv Detail & Related papers (2022-05-11T17:59:31Z)
PLUME: Efficient 3D Object Detection from Stereo Images [95.31278688164646]
Existing methods tackle the problem in two steps: first depth estimation is performed, a pseudo LiDAR point cloud representation is computed from the depth estimates, and then object detection is performed in 3D space. We propose a model that unifies these two tasks in the same metric space. Our approach achieves state-of-the-art performance on the challenging KITTI benchmark, with significantly reduced inference time compared with existing methods.
arXiv Detail & Related papers (2021-01-17T05:11:38Z)
Exploring Severe Occlusion: Multi-Person 3D Pose Estimation with Gated Convolution [34.301501457959056]
We propose a temporal regression network with a gated convolution module to transform 2D joints to 3D. A simple yet effective localization approach is also conducted to transform the normalized pose to the global trajectory. Our proposed method outperforms most state-of-the-art 2D-to-3D pose estimation methods.
arXiv Detail & Related papers (2020-10-31T04:35:24Z)
CRAVES: Controlling Robotic Arm with a Vision-based Economic System [96.56564257199474]
Training a robotic arm to accomplish real-world tasks has been attracting increasing attention in both academia and industry.<n>This work discusses the role of computer vision algorithms in this field.<n>We present an alternative solution, which uses a 3D model to create a large number of synthetic data.
arXiv Detail & Related papers (2018-12-03T13:28:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.