Related papers: DroneKey++: A Size Prior-free Method and New Benchmark for Drone 3D Pose Estimation from Sequential Images

DroneKey++: A Size Prior-free Method and New Benchmark for Drone 3D Pose Estimation from Sequential Images

URL: http://arxiv.org/abs/2602.06211v1
Date: Thu, 05 Feb 2026 21:41:20 GMT
Title: DroneKey++: A Size Prior-free Method and New Benchmark for Drone 3D Pose Estimation from Sequential Images
Authors: Seo-Bin Hwang, Yeong-Jun Cho,
Abstract summary: DroneKey++ is a prior-free framework that jointly performs keypoint detection, drone classification, and 3D pose estimation.<n>To address dataset limitations, we construct 6DroneSyn, a large-scale synthetic benchmark with over 50K images covering 7 drone models and 88 outdoor backgrounds.<n>Experiments show that DroneKey++ achieves MAE 17.34 deg and MedAE 17.1 deg for rotation, MAE 0.135 m and MedAE 0.242 m for translation.
Score: 1.7188280334580195
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Accurate 3D pose estimation of drones is essential for security and surveillance systems. However, existing methods often rely on prior drone information such as physical sizes or 3D meshes. At the same time, current datasets are small-scale, limited to single models, and collected under constrained environments, which makes reliable validation of generalization difficult. We present DroneKey++, a prior-free framework that jointly performs keypoint detection, drone classification, and 3D pose estimation. The framework employs a keypoint encoder for simultaneous keypoint detection and classification, and a pose decoder that estimates 3D pose using ray-based geometric reasoning and class embeddings. To address dataset limitations, we construct 6DroneSyn, a large-scale synthetic benchmark with over 50K images covering 7 drone models and 88 outdoor backgrounds, generated using 360-degree panoramic synthesis. Experiments show that DroneKey++ achieves MAE 17.34 deg and MedAE 17.1 deg for rotation, MAE 0.135 m and MedAE 0.242 m for translation, with inference speeds of 19.25 FPS (CPU) and 414.07 FPS (GPU), demonstrating both strong generalization across drone models and suitability for real-time applications. The dataset is publicly available.

Related papers

Beyond 'Templates': Category-Agnostic Object Pose, Size, and Shape Estimation from a Single View [69.6117755984012]
Estimating an object's 6D pose, size, and shape from visual input is a fundamental problem in computer vision.<n>We propose a unified category-agnostic framework that simultaneously predicts 6D pose, size, and dense shape from a single RGB-D image.
arXiv Detail & Related papers (2025-10-13T17:49:15Z)
DroneKey: Drone 3D Pose Estimation in Image Sequences using Gated Key-representation and Pose-adaptive Learning [1.7188280334580195]
DroneKey is a framework that combines a 2D keypoint detector and a 3D pose estimator specifically designed for drones.<n> Experiments show that our method achieves an AP of 99.68% (OKS) in keypoint detection, outperforming existing methods.<n>For 3D pose estimation, our method achieved an MAE-angle of 10.62deg, an RMSE of 0.221m, and an MAE-absolute of 0.076m, demonstrating high accuracy and reliability.
arXiv Detail & Related papers (2025-08-25T07:40:31Z)
E3D-Bench: A Benchmark for End-to-End 3D Geometric Foundation Models [78.1674905950243]
We present the first comprehensive benchmark for 3D geometric foundation models (GFMs)<n>GFMs directly predict dense 3D representations in a single feed-forward pass, eliminating the need for slow or unavailable precomputed camera parameters.<n>We evaluate 16 state-of-the-art GFMs, revealing their strengths and limitations across tasks and domains.<n>All code, evaluation scripts, and processed data will be publicly released to accelerate research in 3D spatial intelligence.
arXiv Detail & Related papers (2025-06-02T17:53:09Z)
VGGT: Visual Geometry Grounded Transformer [61.37669770946458]
VGGT is a feed-forward neural network that directly infers all key 3D attributes of a scene.<n>Network achieves state-of-the-art results in multiple 3D tasks.
arXiv Detail & Related papers (2025-03-14T17:59:47Z)
YOLOMG: Vision-based Drone-to-Drone Detection with Appearance and Pixel-Level Motion Fusion [9.810747004677474]
This paper proposes a novel end-to-end framework that accurately identifies small drones in complex environments.<n>It starts by creating a motion difference map to capture the motion characteristics of tiny drones.<n>Next, this motion difference map is combined with an RGB image using a bimodal fusion module, allowing for adaptive feature learning of the drone.
arXiv Detail & Related papers (2025-03-10T09:44:21Z)
Shelf-Supervised Cross-Modal Pre-Training for 3D Object Detection [52.66283064389691]
State-of-the-art 3D object detectors are often trained on massive labeled datasets. Recent works demonstrate that self-supervised pre-training with unlabeled data can improve detection accuracy with limited labels. We propose a shelf-supervised approach for generating zero-shot 3D bounding boxes from paired RGB and LiDAR data.
arXiv Detail & Related papers (2024-06-14T15:21:57Z)
TransVisDrone: Spatio-Temporal Transformer for Vision-based Drone-to-Drone Detection in Aerial Videos [57.92385818430939]
Drone-to-drone detection using visual feed has crucial applications, such as detecting drone collisions, detecting drone attacks, or coordinating flight with other drones. Existing methods are computationally costly, follow non-end-to-end optimization, and have complex multi-stage pipelines, making them less suitable for real-time deployment on edge devices. We propose a simple yet effective framework, itTransVisDrone, that provides an end-to-end solution with higher computational efficiency.
arXiv Detail & Related papers (2022-10-16T03:05:13Z)
Lightweight Multi-Drone Detection and 3D-Localization via YOLO [1.284647943889634]
We present and evaluate a method to perform real-time multiple drone detection and three-dimensional localization. We use state-of-the-art tiny-YOLOv4 object detection algorithm and stereo triangulation. Our computer vision approach eliminates the need for computationally expensive stereo matching algorithms.
arXiv Detail & Related papers (2022-02-18T09:41:23Z)
Monocular Quasi-Dense 3D Object Tracking [99.51683944057191]
A reliable and accurate 3D tracking framework is essential for predicting future locations of surrounding objects and planning the observer's actions in numerous applications such as autonomous driving. We propose a framework that can effectively associate moving objects over time and estimate their full 3D bounding box information from a sequence of 2D images captured on a moving platform.
arXiv Detail & Related papers (2021-03-12T15:30:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.