Beyond the Horizon: Decoupling UAVs Multi-View Action Recognition via Partial Order Transfer
- URL: http://arxiv.org/abs/2504.20530v1
- Date: Tue, 29 Apr 2025 08:22:13 GMT
- Title: Beyond the Horizon: Decoupling UAVs Multi-View Action Recognition via Partial Order Transfer
- Authors: Wenxuan Liu, Xian Zhong, Zhuo Zhou, Siyuan Yang, Chia-Wen Lin, Alex Chichung Kot,
- Abstract summary: We introduce a multi-view formulation tailored to varying UAV altitudes and empirically observe a partial order among views.<n>This motivates a novel approach that explicitly models the hierarchical structure of UAV views to improve recognition performance across altitudes.<n>We propose the Partial Order Guided Multi-View Network (POG-MVNet), designed to address drastic view variations.
- Score: 38.646757044416866
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Action recognition in unmanned aerial vehicles (UAVs) poses unique challenges due to significant view variations along the vertical spatial axis. Unlike traditional ground-based settings, UAVs capture actions from a wide range of altitudes, resulting in considerable appearance discrepancies. We introduce a multi-view formulation tailored to varying UAV altitudes and empirically observe a partial order among views, where recognition accuracy consistently decreases as the altitude increases. This motivates a novel approach that explicitly models the hierarchical structure of UAV views to improve recognition performance across altitudes. To this end, we propose the Partial Order Guided Multi-View Network (POG-MVNet), designed to address drastic view variations by effectively leveraging view-dependent information across different altitude levels. The framework comprises three key components: a View Partition (VP) module, which uses the head-to-body ratio to group views by altitude; an Order-aware Feature Decoupling (OFD) module, which disentangles action-relevant and view-specific features under partial order guidance; and an Action Partial Order Guide (APOG), which leverages the partial order to transfer informative knowledge from easier views to support learning in more challenging ones. We conduct experiments on Drone-Action, MOD20, and UAV datasets, demonstrating that POG-MVNet significantly outperforms competing methods. For example, POG-MVNet achieves a 4.7% improvement on Drone-Action dataset and a 3.5% improvement on UAV dataset compared to state-of-the-art methods ASAT and FAR. The code for POG-MVNet will be made available soon.
Related papers
- Aerial Vision-and-Language Navigation with Grid-based View Selection and Map Construction [102.70482302750897]
Aerial Vision-and-Language Navigation (Aerial VLN) aims to obtain an unmanned aerial vehicle agent to navigate aerial 3D environments following human instruction.<n>Previous methods struggle to perform well due to the longer navigation path, more complicated 3D scenes, and the neglect of the interplay between vertical and horizontal actions.<n>We propose a novel grid-based view selection framework that formulates aerial VLN action prediction as a grid-based view selection task.
arXiv Detail & Related papers (2025-03-14T05:20:43Z) - UAV-DETR: Efficient End-to-End Object Detection for Unmanned Aerial Vehicle Imagery [14.599037804047724]
Unmanned aerial vehicle object detection (UAV-OD) has been widely used in various scenarios.<n>Most existing UAV-OD algorithms rely on manually designed components, which require extensive tuning.<n>This paper proposes an efficient detection transformer (DETR) framework tailored for UAV imagery.
arXiv Detail & Related papers (2025-01-03T15:11:14Z) - PPTFormer: Pseudo Multi-Perspective Transformer for UAV Segmentation [18.585299793391748]
We introduce the PPTFormer, a novel textbfPseudo Multi-textbfPerspective textbfTranstextbfformer network.
Our approach circumvents the need for actual multi-perspective data by creating pseudo perspectives for enhanced multi-perspective learning.
arXiv Detail & Related papers (2024-06-28T03:43:49Z) - UCDNet: Multi-UAV Collaborative 3D Object Detection Network by Reliable Feature Mapping [14.401624713578737]
Multi-UAV collaborative 3D object detection can perceive and comprehend complex environments.
We propose an unparalleled camera-based multi-UAV collaborative 3D object detection paradigm called UCDNet.
We show our method increases 4.7% and 10% mAP respectively compared to the baseline.
arXiv Detail & Related papers (2024-06-07T05:27:32Z) - Multi-view Aggregation Network for Dichotomous Image Segmentation [76.75904424539543]
Dichotomous Image (DIS) has recently emerged towards high-precision object segmentation from high-resolution natural images.
Existing methods rely on tedious multiple encoder-decoder streams and stages to gradually complete the global localization and local refinement.
Inspired by it, we model DIS as a multi-view object perception problem and provide a parsimonious multi-view aggregation network (MVANet)
Experiments on the popular DIS-5K dataset show that our MVANet significantly outperforms state-of-the-art methods in both accuracy and speed.
arXiv Detail & Related papers (2024-04-11T03:00:00Z) - DVANet: Disentangling View and Action Features for Multi-View Action
Recognition [56.283944756315066]
We present a novel approach to multi-view action recognition where we guide learned action representations to be separated from view-relevant information in a video.
Our model and method of training significantly outperforms all other uni-modal models on four multi-view action recognition datasets.
arXiv Detail & Related papers (2023-12-10T01:19:48Z) - Multiview Aerial Visual Recognition (MAVREC): Can Multi-view Improve
Aerial Visual Perception? [57.77643186237265]
We present Multiview Aerial Visual RECognition or MAVREC, a video dataset where we record synchronized scenes from different perspectives.
MAVREC consists of around 2.5 hours of industry-standard 2.7K resolution video sequences, more than 0.5 million frames, and 1.1 million annotated bounding boxes.
This makes MAVREC the largest ground and aerial-view dataset, and the fourth largest among all drone-based datasets.
arXiv Detail & Related papers (2023-12-07T18:59:14Z) - SGDViT: Saliency-Guided Dynamic Vision Transformer for UAV Tracking [12.447854608181833]
This work presents a novel saliency-guided dynamic vision Transformer (SGDViT) for UAV tracking.
The proposed method designs a new task-specific object saliency mining network to refine the cross-correlation operation.
A lightweight saliency filtering Transformer further refines saliency information and increases the focus on appearance information.
arXiv Detail & Related papers (2023-03-08T05:01:00Z) - Self-aligned Spatial Feature Extraction Network for UAV Vehicle
Re-identification [3.449626476434765]
Vehicles with same color and type show extremely similar appearance from the UAV's perspective.
Recent works tend to extract distinguishing information by regional features and component features.
In order to extract efficient fine-grained features and avoid tedious annotating work, this letter develops an unsupervised self-aligned network.
arXiv Detail & Related papers (2022-01-08T14:25:54Z) - Perceiving Traffic from Aerial Images [86.994032967469]
We propose an object detection method called Butterfly Detector that is tailored to detect objects in aerial images.
We evaluate our Butterfly Detector on two publicly available UAV datasets (UAVDT and VisDrone 2019) and show that it outperforms previous state-of-the-art methods while remaining real-time.
arXiv Detail & Related papers (2020-09-16T11:37:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.