Real-Time Hybrid Mapping of Populated Indoor Scenes using a Low-Cost
Monocular UAV
- URL: http://arxiv.org/abs/2203.02453v1
- Date: Fri, 4 Mar 2022 17:31:26 GMT
- Title: Real-Time Hybrid Mapping of Populated Indoor Scenes using a Low-Cost
Monocular UAV
- Authors: Stuart Golodetz, Madhu Vankadari, Aluna Everitt, Sangyun Shin, Andrew
Markham and Niki Trigoni
- Abstract summary: We present the first system to perform simultaneous mapping and multi-person 3D human pose estimation from a monocular camera mounted on a single UAV.
In particular, we show how to loosely couple state-of-the-art monocular depth estimation and monocular 3D human pose estimation approaches to reconstruct a hybrid map of a populated indoor scene in real time.
- Score: 42.850288938936075
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Unmanned aerial vehicles (UAVs) have been used for many applications in
recent years, from urban search and rescue, to agricultural surveying, to
autonomous underground mine exploration. However, deploying UAVs in tight,
indoor spaces, especially close to humans, remains a challenge. One solution,
when limited payload is required, is to use micro-UAVs, which pose less risk to
humans and typically cost less to replace after a crash. However, micro-UAVs
can only carry a limited sensor suite, e.g. a monocular camera instead of a
stereo pair or LiDAR, complicating tasks like dense mapping and markerless
multi-person 3D human pose estimation, which are needed to operate in tight
environments around people. Monocular approaches to such tasks exist, and dense
monocular mapping approaches have been successfully deployed for UAV
applications. However, despite many recent works on both marker-based and
markerless multi-UAV single-person motion capture, markerless single-camera
multi-person 3D human pose estimation remains a much earlier-stage technology,
and we are not aware of existing attempts to deploy it in an aerial context. In
this paper, we present what is thus, to our knowledge, the first system to
perform simultaneous mapping and multi-person 3D human pose estimation from a
monocular camera mounted on a single UAV. In particular, we show how to loosely
couple state-of-the-art monocular depth estimation and monocular 3D human pose
estimation approaches to reconstruct a hybrid map of a populated indoor scene
in real time. We validate our component-level design choices via extensive
experiments on the large-scale ScanNet and GTA-IM datasets. To evaluate our
system-level performance, we also construct a new Oxford Hybrid Mapping dataset
of populated indoor scenes.
Related papers
- UAV4D: Dynamic Neural Rendering of Human-Centric UAV Imagery using Gaussian Splatting [54.883935964137706]
We introduce UAV4D, a framework for enabling photorealistic rendering for dynamic real-world scenes captured by UAVs.<n>We use a combination of a 3D foundation model and a human mesh reconstruction model to reconstruct both the scene background and humans.<n>Our results demonstrate the benefits of our approach over existing methods in novel view synthesis, achieving a 1.5 dB PSNR improvement and superior visual sharpness.
arXiv Detail & Related papers (2025-06-05T13:21:09Z) - Vision-in-the-loop Simulation for Deep Monocular Pose Estimation of UAV in Ocean Environment [0.21427777919040414]
This paper proposes a vision-in-the-loop simulation environment for deep monocular pose estimation of a UAV operating in an ocean environment.
We present a photo-realistic 3D virtual environment leveraging recent advancements in Gaussian splatting.
The resulting simulation enables the indoor testing of flight maneuvers while verifying all aspects of flight software, hardware, and the deep monocular pose estimation scheme.
arXiv Detail & Related papers (2025-02-08T02:19:42Z) - UAV3D: A Large-scale 3D Perception Benchmark for Unmanned Aerial Vehicles [12.278437831053985]
Unmanned Aerial Vehicles (UAVs) are employed in numerous applications, including aerial photography, surveillance, and agriculture.
Existing benchmarks for UAV applications are mainly designed for traditional 2D perception tasks.
UAV3D comprises 1,000 scenes, each of which has 20 frames with fully annotated 3D bounding boxes on vehicles.
arXiv Detail & Related papers (2024-10-14T22:24:11Z) - Fully Convolutional One-Stage 3D Object Detection on LiDAR Range Images [96.66271207089096]
FCOS-LiDAR is a fully convolutional one-stage 3D object detector for LiDAR point clouds of autonomous driving scenes.
We show that an RV-based 3D detector with standard 2D convolutions alone can achieve comparable performance to state-of-the-art BEV-based detectors.
arXiv Detail & Related papers (2022-05-27T05:42:16Z) - AirPose: Multi-View Fusion Network for Aerial 3D Human Pose and Shape
Estimation [51.17610485589701]
We present a novel markerless 3D human motion capture (MoCap) system for unstructured, outdoor environments.
AirPose estimates human pose and shape using images captured by multiple uncalibrated flying cameras.
AirPose itself calibrates the cameras relative to the person instead of relying on any pre-calibration.
arXiv Detail & Related papers (2022-01-20T09:46:20Z) - Multi-modal 3D Human Pose Estimation with 2D Weak Supervision in
Autonomous Driving [74.74519047735916]
3D human pose estimation (HPE) in autonomous vehicles (AV) differs from other use cases in many factors.
Data collected for other use cases (such as virtual reality, gaming, and animation) may not be usable for AV applications.
We propose one of the first approaches to alleviate this problem in the AV setting.
arXiv Detail & Related papers (2021-12-22T18:57:16Z) - A Multi-UAV System for Exploration and Target Finding in Cluttered and
GPS-Denied Environments [68.31522961125589]
We propose a framework for a team of UAVs to cooperatively explore and find a target in complex GPS-denied environments with obstacles.
The team of UAVs autonomously navigates, explores, detects, and finds the target in a cluttered environment with a known map.
Results indicate that the proposed multi-UAV system has improvements in terms of time-cost, the proportion of search area surveyed, as well as successful rates for search and rescue missions.
arXiv Detail & Related papers (2021-07-19T12:54:04Z) - UAV-ReID: A Benchmark on Unmanned Aerial Vehicle Re-identification [21.48667873335246]
Recent development in deep learning allows vision-based counter-UAV systems to detect and track UAVs with a single camera.
The coverage of a single camera is limited, necessitating the need for multicamera configurations to match UAVs across cameras.
We propose the first new UAV re-identification data set, UAV-reID, that facilitates the development of machine learning solutions in this emerging area.
arXiv Detail & Related papers (2021-04-13T14:13:09Z) - Generalizable Multi-Camera 3D Pedestrian Detection [1.8303072203996347]
We present a multi-camera 3D pedestrian detection method that does not need to train using data from the target scene.
We estimate pedestrian location on the ground plane using a novel based on human body poses and person's bounding boxes from an off-the-shelf monocular detector.
We then project these locations onto the world ground plane and fuse them with a new formulation of a clique cover problem.
arXiv Detail & Related papers (2021-04-12T20:58:25Z) - Distributed Variable-Baseline Stereo SLAM from two UAVs [17.513645771137178]
In this article, we employ two UAVs equipped with one monocular camera and one IMU each, to exploit their view overlap and relative distance measurements.
In order to control the glsuav agents autonomously, we propose a decentralized collaborative estimation scheme.
We demonstrate the effectiveness of the approach at high altitude flights of up to 160m, going significantly beyond the capabilities of state-of-the-art VIO methods.
arXiv Detail & Related papers (2020-09-10T12:16:10Z) - DronePose: Photorealistic UAV-Assistant Dataset Synthesis for 3D Pose
Estimation via a Smooth Silhouette Loss [27.58747838557417]
3D localisation of the UAV assistant is an important task that can facilitate the exchange of spatial information between the user and the UAV.
We design a data synthesis pipeline to create a realistic multimodal dataset that includes both the exocentric user view, and the egocentric UAV view.
We then exploit the joint availability of photorealistic and synthesized inputs to train a single-shot monocular pose estimation model.
arXiv Detail & Related papers (2020-08-20T07:54:56Z) - Unsupervised Cross-Modal Alignment for Multi-Person 3D Pose Estimation [52.94078950641959]
We present a deployment friendly, fast bottom-up framework for multi-person 3D human pose estimation.
We adopt a novel neural representation of multi-person 3D pose which unifies the position of person instances with their corresponding 3D pose representation.
We propose a practical deployment paradigm where paired 2D or 3D pose annotations are unavailable.
arXiv Detail & Related papers (2020-08-04T07:54:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.