Related papers: EKF-Based Depth Camera and Deep Learning Fusion for UAV-Person Distance Estimation and Following in SAR Operations

EKF-Based Depth Camera and Deep Learning Fusion for UAV-Person Distance Estimation and Following in SAR Operations

URL: http://arxiv.org/abs/2602.20958v1
Date: Tue, 24 Feb 2026 14:37:36 GMT
Title: EKF-Based Depth Camera and Deep Learning Fusion for UAV-Person Distance Estimation and Following in SAR Operations
Authors: Luka Šiktar, Branimir Ćaran, Bojan Šekoranja, Marko Švaco,
Abstract summary: Vision-based UAVs aid human search tasks by detecting and recognizing specific individuals, then tracking and following them while maintaining a safe distance.<n>A key safety requirement for UAV following is the accurate estimation of the distance between camera and target object under real-world conditions.<n>This paper presents the fusion of depth camera measurements and monocular camera-to-body distance estimation for robust tracking and following.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Search and rescue (SAR) operations require rapid responses to save lives or property. Unmanned Aerial Vehicles (UAVs) equipped with vision-based systems support these missions through prior terrain investigation or real-time assistance during the mission itself. Vision-based UAV frameworks aid human search tasks by detecting and recognizing specific individuals, then tracking and following them while maintaining a safe distance. A key safety requirement for UAV following is the accurate estimation of the distance between camera and target object under real-world conditions, achieved by fusing multiple image modalities. UAVs with deep learning-based vision systems offer a new approach to the planning and execution of SAR operations. As part of the system for automatic people detection and face recognition using deep learning, in this paper we present the fusion of depth camera measurements and monocular camera-to-body distance estimation for robust tracking and following. Deep learning-based filtering of depth camera data and estimation of camera-to-body distance from a monocular camera are achieved with YOLO-pose, enabling real-time fusion of depth information using the Extended Kalman Filter (EKF) algorithm. The proposed subsystem, designed for use in drones, estimates and measures the distance between the depth camera and the human body keypoints, to maintain the safe distance between the drone and the human target. Our system provides an accurate estimated distance, which has been validated against motion capture ground truth data. The system has been tested in real time indoors, where it reduces the average errors, root mean square error (RMSE) and standard deviations of distance estimation up to 15,3\% in three tested scenarios.

Related papers

AerialMind: Towards Referring Multi-Object Tracking in UAV Scenarios [64.51320327698231]
We introduce AerialMind, the first large-scale RMOT benchmark in UAV scenarios.<n>We develop an innovative semi-automated collaborative agent-based labeling assistant framework.<n>We also propose HawkEyeTrack, a novel method that collaboratively enhances vision-language representation learning.
arXiv Detail & Related papers (2025-11-26T04:44:27Z)
NOVA: Navigation via Object-Centric Visual Autonomy for High-Speed Target Tracking in Unstructured GPS-Denied Environments [56.35569661650558]
We introduce NOVA, a fully onboard, object-centric framework that enables robust target tracking and collision-aware navigation.<n>Rather than constructing a global map, NOVA formulates perception, estimation, and control entirely in the target's reference frame.<n>We validate NOVA across challenging real-world scenarios, including urban mazes, forest trails, and repeated transitions through buildings with intermittent GPS loss.
arXiv Detail & Related papers (2025-06-23T14:28:30Z)
Approximate Supervised Object Distance Estimation on Unmanned Surface Vehicles [12.59850892985157]
We introduce a novel approach for approximate distance estimation in USVs using supervised object detection.<n>We propose a specialized branch of an object detection model, not only to detect objects but also to predict their distances from the USV.<n>We demonstrate its application in a marine assistance system that alerts operators to nearby objects such as boats, buoys, or other waterborne hazards.
arXiv Detail & Related papers (2025-01-09T20:34:36Z)
Active Human Pose Estimation via an Autonomous UAV Agent [13.188563931419056]
This paper focuses on the task of human pose estimation from videos capturing a person's activity. To address this, relocating the camera to a new vantage point is necessary to clarify the view. Our proposed solution comprises three main components: a NeRF-based Drone-View Data Generation Framework, an On-Drone Network for Camera View Error Estimation, and a Combined Planner.
arXiv Detail & Related papers (2024-07-01T21:20:52Z)
Ensuring UAV Safety: A Vision-only and Real-time Framework for Collision Avoidance Through Object Detection, Tracking, and Distance Estimation [16.671696289301625]
This paper presents a deep-learning framework that utilizes optical sensors for the detection, tracking, and distance estimation of non-cooperative aerial vehicles. In this work, we propose a method for estimating the distance information of a detected aerial object in real time using only the input of a monocular camera.
arXiv Detail & Related papers (2024-05-10T18:06:41Z)
SDGE: Stereo Guided Depth Estimation for 360$^\circ$ Camera Sets [65.64958606221069]
Multi-camera systems are often used in autonomous driving to achieve a 360$circ$ perception. These 360$circ$ camera sets often have limited or low-quality overlap regions, making multi-view stereo methods infeasible for the entire image. We propose the Stereo Guided Depth Estimation (SGDE) method, which enhances depth estimation of the full image by explicitly utilizing multi-view stereo results on the overlap.
arXiv Detail & Related papers (2024-02-19T02:41:37Z)
OccNeRF: Advancing 3D Occupancy Prediction in LiDAR-Free Environments [77.0399450848749]
We propose an OccNeRF method for training occupancy networks without 3D supervision. We parameterize the reconstructed occupancy fields and reorganize the sampling strategy to align with the cameras' infinite perceptive range. For semantic occupancy prediction, we design several strategies to polish the prompts and filter the outputs of a pretrained open-vocabulary 2D segmentation model.
arXiv Detail & Related papers (2023-12-14T18:58:52Z)
SurroundDepth: Entangling Surrounding Views for Self-Supervised Multi-Camera Depth Estimation [101.55622133406446]
We propose a SurroundDepth method to incorporate the information from multiple surrounding views to predict depth maps across cameras. Specifically, we employ a joint network to process all the surrounding views and propose a cross-view transformer to effectively fuse the information from multiple views. In experiments, our method achieves the state-of-the-art performance on the challenging multi-camera depth estimation datasets.
arXiv Detail & Related papers (2022-04-07T17:58:47Z)
Safe Vessel Navigation Visually Aided by Autonomous Unmanned Aerial Vehicles in Congested Harbors and Waterways [9.270928705464193]
This work is the first attempt to detect and estimate distances to unknown objects from long-range visual data captured with conventional RGB cameras and auxiliary absolute positioning systems (e.g. GPS) The simulation results illustrate the accuracy and efficacy of the proposed method for visually aided navigation of vessels assisted by UAV.
arXiv Detail & Related papers (2021-08-09T08:15:17Z)
A Multi-UAV System for Exploration and Target Finding in Cluttered and GPS-Denied Environments [68.31522961125589]
We propose a framework for a team of UAVs to cooperatively explore and find a target in complex GPS-denied environments with obstacles. The team of UAVs autonomously navigates, explores, detects, and finds the target in a cluttered environment with a known map. Results indicate that the proposed multi-UAV system has improvements in terms of time-cost, the proportion of search area surveyed, as well as successful rates for search and rescue missions.
arXiv Detail & Related papers (2021-07-19T12:54:04Z)
Single View Physical Distance Estimation using Human Pose [18.9877515094788]
We propose a fully automated system that simultaneously estimates the camera intrinsics, the ground plane, and physical distances between people from a single RGB image or video. The proposed approach enables existing camera systems to measure physical distances without needing a dedicated calibration process or range sensors. We contribute to the publicly available MEVA dataset with additional distance annotations, resulting in MEVADA -- the first evaluation benchmark in the world for the pose-based auto-calibration and distance estimation problem.
arXiv Detail & Related papers (2021-06-18T19:50:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.