Related papers: MVA 2025 Small Multi-Object Tracking for Spotting Birds Challenge: Dataset, Methods, and Results

MVA 2025 Small Multi-Object Tracking for Spotting Birds Challenge: Dataset, Methods, and Results

URL: http://arxiv.org/abs/2507.12832v1
Date: Thu, 17 Jul 2025 06:45:47 GMT
Title: MVA 2025 Small Multi-Object Tracking for Spotting Birds Challenge: Dataset, Methods, and Results
Authors: Yuki Kondo, Norimichi Ukita, Riku Kanayama, Yuki Yoshida, Takayuki Yamaguchi, Xiang Yu, Guang Liang, Xinyao Liu, Guan-Zhang Wang, Wei-Ta Chu, Bing-Cheng Chuang, Jia-Hua Lee, Pin-Tseng Kuo, I-Hsuan Chu, Yi-Shein Hsiao, Cheng-Han Wu, Po-Yi Wu, Jui-Chien Tsou, Hsuan-Chi Liu, Chun-Yi Lee, Yuan-Fu Yang, Kosuke Shigematsu, Asuka Shin, Ba Tran,
Abstract summary: This paper introduces the SMOT4SB challenge, which leverages temporal information to address limitations of single-frame detection.<n>Our three main contributions are: (1) the SMOT4SB dataset, consisting of 211 UAV video sequences with 108,192 annotated frames under diverse real-world conditions; (2) SO-HOTA, a novel metric combining Dot Distance with HOTA to mitigate the sensitivity of IoU-based metrics to small displacements; and (3) a competitive MVA2025 challenge with 78 participants and 308 submissions, where the winning method achieved a 5.1x improvement over the baseline.
Score: 15.90859212645041
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Small Multi-Object Tracking (SMOT) is particularly challenging when targets occupy only a few dozen pixels, rendering detection and appearance-based association unreliable. Building on the success of the MVA2023 SOD4SB challenge, this paper introduces the SMOT4SB challenge, which leverages temporal information to address limitations of single-frame detection. Our three main contributions are: (1) the SMOT4SB dataset, consisting of 211 UAV video sequences with 108,192 annotated frames under diverse real-world conditions, designed to capture motion entanglement where both camera and targets move freely in 3D; (2) SO-HOTA, a novel metric combining Dot Distance with HOTA to mitigate the sensitivity of IoU-based metrics to small displacements; and (3) a competitive MVA2025 challenge with 78 participants and 308 submissions, where the winning method achieved a 5.1x improvement over the baseline. This work lays a foundation for advancing SMOT in UAV scenarios with applications in bird strike avoidance, agriculture, fisheries, and ecological monitoring.

Related papers

Tracking the Unstable: Appearance-Guided Motion Modeling for Robust Multi-Object Tracking in UAV-Captured Videos [58.156141601478794]
Multi-object tracking (UAVT) aims to track multiple objects while maintaining consistent identities across frames of a given video.<n>Existing methods typically model motion cues and appearance separately, overlooking their interplay and resulting in suboptimal tracking performance.<n>We propose AMOT, which exploits appearance and motion cues through two key components: an Appearance-Motion Consistency (AMC) matrix and a Motion-aware Track Continuation (MTC) module.
arXiv Detail & Related papers (2025-08-03T12:06:47Z)
YOLOv8-SMOT: An Efficient and Robust Framework for Real-Time Small Object Tracking via Slice-Assisted Training and Adaptive Association [5.07987775511372]
This paper details our championship-winning solution in the MVA 2025 "Finding Birds" Small Multi-Object Tracking Challenge (SMOT4SB)<n>It adopts the tracking-by-detection paradigm with targeted innovations at both the detection and association levels.<n>Our method achieves state-of-the-art performance on the SMOT4SB public test set, reaching an SO-HOTA score of textbf55.205.
arXiv Detail & Related papers (2025-07-16T09:51:19Z)
AG-VPReID 2025: Aerial-Ground Video-based Person Re-identification Challenge Results [64.38412449125872]
This paper introduces the AG-VPReID 2025 Challenge - the first large-scale video-based competition focused on high-altitude (80-120m) aerial-ground ReID.<n>The challenge was constructed on the new AG-VPReID dataset with 3,027 identities, over 13,500 tracklets, and approximately 3.7 million frames captured from UAVs, CCTV, and wearable cameras.<n>The leading approach, X-TFCLIP from UAM, attained 72.28% Rank-1 accuracy in the aerial-to-ground ReID setting and 70.77% in the ground-to-aerial ReID setting
arXiv Detail & Related papers (2025-06-28T10:45:30Z)
NOVA: Navigation via Object-Centric Visual Autonomy for High-Speed Target Tracking in Unstructured GPS-Denied Environments [56.35569661650558]
We introduce NOVA, a fully onboard, object-centric framework that enables robust target tracking and collision-aware navigation.<n>Rather than constructing a global map, NOVA formulates perception, estimation, and control entirely in the target's reference frame.<n>We validate NOVA across challenging real-world scenarios, including urban mazes, forest trails, and repeated transitions through buildings with intermittent GPS loss.
arXiv Detail & Related papers (2025-06-23T14:28:30Z)
MASF-YOLO: An Improved YOLOv11 Network for Small Object Detection on Drone View [0.0]
We propose a novel object detection network Multi-scale Context Aggregation and Scale-adaptive Fusion YOLO (MASF-YOLO)<n>To tackle the difficulty of detecting small objects in UAV images, we design a Multi-scale Feature Aggregation Module (MFAM), which significantly improves the detection accuracy of small objects.<n>Thirdly, we introduce a Dimension-Aware Selective Integration Module (DASI), which further enhances multi-scale feature fusion capabilities.
arXiv Detail & Related papers (2025-04-25T07:43:33Z)
S3MOT: Monocular 3D Object Tracking with Selective State Space Model [3.5047603107971397]
Multi-object tracking in 3D space is essential for advancing robotics and computer applications.<n>It remains a significant challenge in monocular setups due to the difficulty of mining 3D associations from 2D video streams.<n>We present three innovative techniques to enhance the fusion of heterogeneous cues for monocular 3D MOT.
arXiv Detail & Related papers (2025-04-25T04:45:35Z)
Griffin: Aerial-Ground Cooperative Detection and Tracking Dataset and Benchmark [15.405137983083875]
Aerial-ground cooperation offers a promising solution by integrating UAVs' aerial views with ground vehicles' local observations.<n>This paper presents a comprehensive solution for aerial-ground cooperative 3D perception through three key contributions.
arXiv Detail & Related papers (2025-03-10T07:00:07Z)
AIM 2024 Sparse Neural Rendering Challenge: Methods and Results [64.19942455360068]
This paper reviews the challenge on Sparse Neural Rendering that was part of the Advances in Image Manipulation (AIM) workshop, held in conjunction with ECCV 2024. The challenge aims at producing novel camera view synthesis of diverse scenes from sparse image observations. Participants are asked to optimise objective fidelity to the ground-truth images as measured via the Peak Signal-to-Noise Ratio (PSNR) metric.
arXiv Detail & Related papers (2024-09-23T14:17:40Z)
RoboSense: Large-scale Dataset and Benchmark for Egocentric Robot Perception and Navigation in Crowded and Unstructured Environments [62.5830455357187]
We setup an egocentric multi-sensor data collection platform based on 3 main types of sensors (Camera, LiDAR and Fisheye)<n>A large-scale multimodal dataset is constructed, named RoboSense, to facilitate egocentric robot perception.
arXiv Detail & Related papers (2024-08-28T03:17:40Z)
Towards Efficient 3D Object Detection in Bird's-Eye-View Space for Autonomous Driving: A Convolutional-Only Approach [13.513005108086006]
We propose an efficient BEV-based 3D detection framework called BEVENet. BEVENet is 3$times$ faster than contemporary state-of-the-art (SOTA) approaches on the NuScenes challenge. Our experiments show that BEVENet is 3$times$ faster than contemporary state-of-the-art (SOTA) approaches.
arXiv Detail & Related papers (2023-12-01T14:52:59Z)
BEVTrack: A Simple and Strong Baseline for 3D Single Object Tracking in Bird's-Eye View [56.77287041917277]
3D Single Object Tracking (SOT) is a fundamental task of computer vision, proving essential for applications like autonomous driving. In this paper, we propose BEVTrack, a simple yet effective baseline method. By estimating the target motion in Bird's-Eye View (BEV) to perform tracking, BEVTrack demonstrates surprising simplicity from various aspects, i.e., network designs, training objectives, and tracking pipeline, while achieving superior performance.
arXiv Detail & Related papers (2023-09-05T12:42:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.