Improving Real-Time Omnidirectional 3D Multi-Person Human Pose Estimation with People Matching and Unsupervised 2D-3D Lifting
- URL: http://arxiv.org/abs/2403.09437v1
- Date: Thu, 14 Mar 2024 14:30:31 GMT
- Title: Improving Real-Time Omnidirectional 3D Multi-Person Human Pose Estimation with People Matching and Unsupervised 2D-3D Lifting
- Authors: Pawel Knap, Peter Hardy, Alberto Tamajo, Hwasup Lim, Hansung Kim,
- Abstract summary: Current human pose estimation systems focus on retrieving an accurate 3D global estimate of a single person.
This paper presents one of the first 3D multi-person human pose estimation systems that is able to work in real-time.
- Score: 3.231937990387248
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Current human pose estimation systems focus on retrieving an accurate 3D global estimate of a single person. Therefore, this paper presents one of the first 3D multi-person human pose estimation systems that is able to work in real-time and is also able to handle basic forms of occlusion. First, we adjust an off-the-shelf 2D detector and an unsupervised 2D-3D lifting model for use with a 360$^\circ$ panoramic camera and mmWave radar sensors. We then introduce several contributions, including camera and radar calibrations, and the improved matching of people within the image and radar space. The system addresses both the depth and scale ambiguity problems by employing a lightweight 2D-3D pose lifting algorithm that is able to work in real-time while exhibiting accurate performance in both indoor and outdoor environments which offers both an affordable and scalable solution. Notably, our system's time complexity remains nearly constant irrespective of the number of detected individuals, achieving a frame rate of approximately 7-8 fps on a laptop with a commercial-grade GPU.
Related papers
- MPL: Lifting 3D Human Pose from Multi-view 2D Poses [75.26416079541723]
We propose combining 2D pose estimation, for which large and rich training datasets exist, and 2D-to-3D pose lifting, using a transformer-based network.
Our experiments demonstrate decreases up to 45% in MPJPE errors compared to the 3D pose obtained by triangulating the 2D poses.
arXiv Detail & Related papers (2024-08-20T12:55:14Z) - Unsupervised Multi-Person 3D Human Pose Estimation From 2D Poses Alone [4.648549457266638]
We present one of the first studies investigating the feasibility of unsupervised multi-person 2D-3D pose estimation.
Our method involves independently lifting each subject's 2D pose to 3D, before combining them in a shared 3D coordinate system.
This by itself enables us to retrieve an accurate 3D reconstruction of their poses.
arXiv Detail & Related papers (2023-09-26T11:42:56Z) - Scene-Aware 3D Multi-Human Motion Capture from a Single Camera [83.06768487435818]
We consider the problem of estimating the 3D position of multiple humans in a scene as well as their body shape and articulation from a single RGB video recorded with a static camera.
We leverage recent advances in computer vision using large-scale pre-trained models for a variety of modalities, including 2D body joints, joint angles, normalized disparity maps, and human segmentation masks.
In particular, we estimate the scene depth and unique person scale from normalized disparity predictions using the 2D body joints and joint angles.
arXiv Detail & Related papers (2023-01-12T18:01:28Z) - DiffuPose: Monocular 3D Human Pose Estimation via Denoising Diffusion
Probabilistic Model [25.223801390996435]
This paper focuses on reconstructing a 3D pose from a single 2D keypoint detection.
We build a novel diffusion-based framework to effectively sample diverse 3D poses from an off-the-shelf 2D detector.
We evaluate our method on the widely adopted Human3.6M and HumanEva-I datasets.
arXiv Detail & Related papers (2022-12-06T07:22:20Z) - Weakly Supervised 3D Multi-person Pose Estimation for Large-scale Scenes
based on Monocular Camera and Single LiDAR [41.39277657279448]
We propose a monocular camera and single LiDAR-based method for 3D multi-person pose estimation in large-scale scenes.
Specifically, we design an effective fusion strategy to take advantage of multi-modal input data, including images and point cloud.
Our method exploits the inherent geometry constraints of point cloud for self-supervision and utilizes 2D keypoints on images for weak supervision.
arXiv Detail & Related papers (2022-11-30T12:50:40Z) - Multi-modal 3D Human Pose Estimation with 2D Weak Supervision in
Autonomous Driving [74.74519047735916]
3D human pose estimation (HPE) in autonomous vehicles (AV) differs from other use cases in many factors.
Data collected for other use cases (such as virtual reality, gaming, and animation) may not be usable for AV applications.
We propose one of the first approaches to alleviate this problem in the AV setting.
arXiv Detail & Related papers (2021-12-22T18:57:16Z) - MetaPose: Fast 3D Pose from Multiple Views without 3D Supervision [72.5863451123577]
We show how to train a neural model that can perform accurate 3D pose and camera estimation.
Our method outperforms both classical bundle adjustment and weakly-supervised monocular 3D baselines.
arXiv Detail & Related papers (2021-08-10T18:39:56Z) - Human POSEitioning System (HPS): 3D Human Pose Estimation and
Self-localization in Large Scenes from Body-Mounted Sensors [71.29186299435423]
We introduce (HPS) Human POSEitioning System, a method to recover the full 3D pose of a human registered with a 3D scan of the surrounding environment.
We show that our optimization-based integration exploits the benefits of the two, resulting in pose accuracy free of drift.
HPS could be used for VR/AR applications where humans interact with the scene without requiring direct line of sight with an external camera.
arXiv Detail & Related papers (2021-03-31T17:58:31Z) - Iterative Greedy Matching for 3D Human Pose Tracking from Multiple Views [22.86745487695168]
We propose an approach for estimating 3D human poses of multiple people from a set of calibrated cameras.
Our approach builds upon a real-time 2D multi-person pose estimation system and greedily solves the association problem between multiple views.
arXiv Detail & Related papers (2021-01-24T16:28:10Z) - Synthetic Training for Monocular Human Mesh Recovery [100.38109761268639]
This paper aims to estimate 3D mesh of multiple body parts with large-scale differences from a single RGB image.
The main challenge is lacking training data that have complete 3D annotations of all body parts in 2D images.
We propose a depth-to-scale (D2S) projection to incorporate the depth difference into the projection function to derive per-joint scale variants.
arXiv Detail & Related papers (2020-10-27T03:31:35Z) - SMAP: Single-Shot Multi-Person Absolute 3D Pose Estimation [46.85865451812981]
We propose a novel system that first regresses a set of 2.5D representations of body parts and then reconstructs the 3D absolute poses based on these 2.5D representations with a depth-aware part association algorithm.
Such a single-shot bottom-up scheme allows the system to better learn and reason about the inter-person depth relationship, improving both 3D and 2D pose estimation.
arXiv Detail & Related papers (2020-08-26T09:56:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.