Related papers: FlyPose: Towards Robust Human Pose Estimation From Aerial Views

FlyPose: Towards Robust Human Pose Estimation From Aerial Views

URL: http://arxiv.org/abs/2601.05747v1
Date: Fri, 09 Jan 2026 12:01:36 GMT
Title: FlyPose: Towards Robust Human Pose Estimation From Aerial Views
Authors: Hassaan Farooq, Marvin Brenner, Peter St\ütz,
Abstract summary: We train and deploy FlyPose, a lightweight top-down human pose estimation pipeline for aerial imagery.<n>We achieve an average improvement of 6.8 mAP in person detection across the test-sets of Manipal-UAV, VisDrone, HIT-UAV and our custom dataset.<n>FlyPose runs with an inference latency of 20 milliseconds including preprocessing on a Jetson Orin AGX Developer Kit.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Unmanned Aerial Vehicles (UAVs) are increasingly deployed in close proximity to humans for applications such as parcel delivery, traffic monitoring, disaster response and infrastructure inspections. Ensuring safe and reliable operation in these human-populated environments demands accurate perception of human poses and actions from an aerial viewpoint. This perspective challenges existing methods with low resolution, steep viewing angles and (self-)occlusion, especially if the application demands realtime feasibile models. We train and deploy FlyPose, a lightweight top-down human pose estimation pipeline for aerial imagery. Through multi-dataset training, we achieve an average improvement of 6.8 mAP in person detection across the test-sets of Manipal-UAV, VisDrone, HIT-UAV as well as our custom dataset. For 2D human pose estimation we report an improvement of 16.3 mAP on the challenging UAV-Human dataset. FlyPose runs with an inference latency of ~20 milliseconds including preprocessing on a Jetson Orin AGX Developer Kit and is deployed onboard a quadrotor UAV during flight experiments. We also publish FlyPose-104, a small but challenging aerial human pose estimation dataset, that includes manual annotations from difficult aerial perspectives: https://github.com/farooqhassaan/FlyPose.

Related papers

How Far are Modern Trackers from UAV-Anti-UAV? A Million-Scale Benchmark and New Baseline [74.4054700050366]
Unmanned Aerial Vehicles (UAVs) offer wide-ranging applications but also pose significant safety and privacy violation risks.<n>Current Anti-UAV research primarily focuses on RGB, infrared (IR), or RGB-IR videos captured by fixed ground cameras.<n>We propose a new multi-modal visual tracking task termed UAV-Anti-UAV, which involves a pursuer UAV tracking a target adversarial UAV in the video stream.
arXiv Detail & Related papers (2025-12-08T10:19:54Z)
AeroLite-MDNet: Lightweight Multi-task Deviation Detection Network for UAV Landing [9.858832286469765]
We propose a deviation warning system for UAV landings powered by a novel vision-based model called AeroLite-MDNet.<n>We introduce a new evaluation metric, Average Warning Delay (AWD), to quantify the system's sensitivity to landing deviations.<n> Experimental results show that our system achieves an AWD of 0.7 seconds with a deviation detection accuracy of 98.6%.
arXiv Detail & Related papers (2025-06-25T13:48:30Z)
Active Human Pose Estimation via an Autonomous UAV Agent [13.188563931419056]
This paper focuses on the task of human pose estimation from videos capturing a person's activity. To address this, relocating the camera to a new vantage point is necessary to clarify the view. Our proposed solution comprises three main components: a NeRF-based Drone-View Data Generation Framework, an On-Drone Network for Camera View Error Estimation, and a Combined Planner.
arXiv Detail & Related papers (2024-07-01T21:20:52Z)
Angle Robustness Unmanned Aerial Vehicle Navigation in GNSS-Denied Scenarios [66.05091704671503]
We present a novel angle navigation paradigm to deal with flight deviation in point-to-point navigation tasks. We also propose a model that includes the Adaptive Feature Enhance Module, Cross-knowledge Attention-guided Module and Robust Task-oriented Head Module.
arXiv Detail & Related papers (2024-02-04T08:41:20Z)
Bottom-Up 2D Pose Estimation via Dual Anatomical Centers for Small-Scale Persons [75.86463396561744]
In multi-person 2D pose estimation, the bottom-up methods simultaneously predict poses for all persons. Our method achieves 38.4% improvement on bounding box precision and 39.1% improvement on bounding box recall over the state of the art (SOTA) For the human pose AP evaluation, we achieve a new SOTA (71.0 AP) on the COCO test-dev set with the single-scale testing.
arXiv Detail & Related papers (2022-08-25T10:09:10Z)
Real-Time Hybrid Mapping of Populated Indoor Scenes using a Low-Cost Monocular UAV [42.850288938936075]
We present the first system to perform simultaneous mapping and multi-person 3D human pose estimation from a monocular camera mounted on a single UAV. In particular, we show how to loosely couple state-of-the-art monocular depth estimation and monocular 3D human pose estimation approaches to reconstruct a hybrid map of a populated indoor scene in real time.
arXiv Detail & Related papers (2022-03-04T17:31:26Z)
AirPose: Multi-View Fusion Network for Aerial 3D Human Pose and Shape Estimation [51.17610485589701]
We present a novel markerless 3D human motion capture (MoCap) system for unstructured, outdoor environments. AirPose estimates human pose and shape using images captured by multiple uncalibrated flying cameras. AirPose itself calibrates the cameras relative to the person instead of relying on any pre-calibration.
arXiv Detail & Related papers (2022-01-20T09:46:20Z)
Rethinking Drone-Based Search and Rescue with Aerial Person Detection [79.76669658740902]
The visual inspection of aerial drone footage is an integral part of land search and rescue (SAR) operations today. We propose a novel deep learning algorithm to automate this aerial person detection (APD) task. We present the novel Aerial Inspection RetinaNet (AIR) algorithm as the combination of these contributions.
arXiv Detail & Related papers (2021-11-17T21:48:31Z)
UAV-Human: A Large Benchmark for Human Behavior Understanding with Unmanned Aerial Vehicles [12.210724541266183]
We propose a new benchmark - UAVHuman - for human behavior understanding with UAVs. Our dataset contains 67,428 multi-modal video sequences and 119 subjects for action recognition. We propose a fisheye-based action recognition method that mitigates the distortions in fisheye videos via learning transformations guided by flat RGB videos.
arXiv Detail & Related papers (2021-04-02T08:54:04Z)
Perceiving Traffic from Aerial Images [86.994032967469]
We propose an object detection method called Butterfly Detector that is tailored to detect objects in aerial images. We evaluate our Butterfly Detector on two publicly available UAV datasets (UAVDT and VisDrone 2019) and show that it outperforms previous state-of-the-art methods while remaining real-time.
arXiv Detail & Related papers (2020-09-16T11:37:43Z)
Distributed Variable-Baseline Stereo SLAM from two UAVs [17.513645771137178]
In this article, we employ two UAVs equipped with one monocular camera and one IMU each, to exploit their view overlap and relative distance measurements. In order to control the glsuav agents autonomously, we propose a decentralized collaborative estimation scheme. We demonstrate the effectiveness of the approach at high altitude flights of up to 160m, going significantly beyond the capabilities of state-of-the-art VIO methods.
arXiv Detail & Related papers (2020-09-10T12:16:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.