AirPose: Multi-View Fusion Network for Aerial 3D Human Pose and Shape
Estimation
- URL: http://arxiv.org/abs/2201.08093v1
- Date: Thu, 20 Jan 2022 09:46:20 GMT
- Title: AirPose: Multi-View Fusion Network for Aerial 3D Human Pose and Shape
Estimation
- Authors: Nitin Saini, Elia Bonetto, Eric Price, Aamir Ahmad and Michael J.
Black
- Abstract summary: We present a novel markerless 3D human motion capture (MoCap) system for unstructured, outdoor environments.
AirPose estimates human pose and shape using images captured by multiple uncalibrated flying cameras.
AirPose itself calibrates the cameras relative to the person instead of relying on any pre-calibration.
- Score: 51.17610485589701
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In this letter, we present a novel markerless 3D human motion capture (MoCap)
system for unstructured, outdoor environments that uses a team of autonomous
unmanned aerial vehicles (UAVs) with on-board RGB cameras and computation.
Existing methods are limited by calibrated cameras and off-line processing.
Thus, we present the first method (AirPose) to estimate human pose and shape
using images captured by multiple extrinsically uncalibrated flying cameras.
AirPose itself calibrates the cameras relative to the person instead of relying
on any pre-calibration. It uses distributed neural networks running on each UAV
that communicate viewpoint-independent information with each other about the
person (i.e., their 3D shape and articulated pose). The person's shape and pose
are parameterized using the SMPL-X body model, resulting in a compact
representation, that minimizes communication between the UAVs. The network is
trained using synthetic images of realistic virtual environments, and
fine-tuned on a small set of real images. We also introduce an
optimization-based post-processing method (AirPose$^{+}$) for offline
applications that require higher MoCap quality. We make our method's code and
data available for research at
https://github.com/robot-perception-group/AirPose. A video describing the
approach and results is available at https://youtu.be/xLYe1TNHsfs.
Related papers
- Multi-HMR: Multi-Person Whole-Body Human Mesh Recovery in a Single Shot [22.848563931757962]
We present Multi-HMR, a strong sigle-shot model for multi-person 3D human mesh recovery from a single RGB image.
Predictions encompass the whole body, including hands and facial expressions, using the SMPL-X parametric model.
We show that incorporating it into the training data further enhances predictions, particularly for hands.
arXiv Detail & Related papers (2024-02-22T16:05:13Z) - FlowCam: Training Generalizable 3D Radiance Fields without Camera Poses
via Pixel-Aligned Scene Flow [26.528667940013598]
Reconstruction of 3D neural fields from posed images has emerged as a promising method for self-supervised representation learning.
Key challenge preventing the deployment of these 3D scene learners on large-scale video data is their dependence on precise camera poses from structure-from-motion.
We propose a method that jointly reconstructs camera poses and 3D neural scene representations online and in a single forward pass.
arXiv Detail & Related papers (2023-05-31T20:58:46Z) - Scene-Aware 3D Multi-Human Motion Capture from a Single Camera [83.06768487435818]
We consider the problem of estimating the 3D position of multiple humans in a scene as well as their body shape and articulation from a single RGB video recorded with a static camera.
We leverage recent advances in computer vision using large-scale pre-trained models for a variety of modalities, including 2D body joints, joint angles, normalized disparity maps, and human segmentation masks.
In particular, we estimate the scene depth and unique person scale from normalized disparity predictions using the 2D body joints and joint angles.
arXiv Detail & Related papers (2023-01-12T18:01:28Z) - SmartMocap: Joint Estimation of Human and Camera Motion using
Uncalibrated RGB Cameras [49.110201064166915]
Markerless human motion capture (mocap) from multiple RGB cameras is a widely studied problem.
Existing methods either need calibrated cameras or calibrate them relative to a static camera, which acts as the reference frame for the mocap system.
We propose a mocap method which uses multiple static and moving extrinsically uncalibrated RGB cameras.
arXiv Detail & Related papers (2022-09-28T08:21:04Z) - VirtualPose: Learning Generalizable 3D Human Pose Models from Virtual
Data [69.64723752430244]
We introduce VirtualPose, a two-stage learning framework to exploit the hidden "free lunch" specific to this task.
The first stage transforms images to abstract geometry representations (AGR), and then the second maps them to 3D poses.
It addresses the generalization issue from two aspects: (1) the first stage can be trained on diverse 2D datasets to reduce the risk of over-fitting to limited appearance; (2) the second stage can be trained on diverse AGR synthesized from a large number of virtual cameras and poses.
arXiv Detail & Related papers (2022-07-20T14:47:28Z) - Newton-PnP: Real-time Visual Navigation for Autonomous Toy-Drones [15.075691719756877]
Perspective-n-Point problem aims to estimate the relative pose between a calibrated monocular camera and a known 3D model.
We suggest an algorithm that runs on weak IoT in real-time but still provides provable guarantees for both running time and correctness.
Our main motivation was to turn the popular DJI's Tello Drone into an autonomous drone that navigates in an indoor environment with no external human/laptop/sensor.
arXiv Detail & Related papers (2022-03-05T09:00:50Z) - Human POSEitioning System (HPS): 3D Human Pose Estimation and
Self-localization in Large Scenes from Body-Mounted Sensors [71.29186299435423]
We introduce (HPS) Human POSEitioning System, a method to recover the full 3D pose of a human registered with a 3D scan of the surrounding environment.
We show that our optimization-based integration exploits the benefits of the two, resulting in pose accuracy free of drift.
HPS could be used for VR/AR applications where humans interact with the scene without requiring direct line of sight with an external camera.
arXiv Detail & Related papers (2021-03-31T17:58:31Z) - CanonPose: Self-Supervised Monocular 3D Human Pose Estimation in the
Wild [31.334715988245748]
We propose a self-supervised approach that learns a single image 3D pose estimator from unlabeled multi-view data.
In contrast to most existing methods, we do not require calibrated cameras and can therefore learn from moving cameras.
Key to the success are new, unbiased reconstruction objectives that mix information across views and training samples.
arXiv Detail & Related papers (2020-11-30T10:42:27Z) - VoxelPose: Towards Multi-Camera 3D Human Pose Estimation in Wild
Environment [80.77351380961264]
We present an approach to estimate 3D poses of multiple people from multiple camera views.
We present an end-to-end solution which operates in the $3$D space, therefore avoids making incorrect decisions in the 2D space.
We propose Pose Regression Network (PRN) to estimate a detailed 3D pose for each proposal.
arXiv Detail & Related papers (2020-04-13T23:50:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.