Related papers: UAVPairs: A Challenging Benchmark for Match Pair Retrieval of Large-scale UAV Images

UAVPairs: A Challenging Benchmark for Match Pair Retrieval of Large-scale UAV Images

URL: http://arxiv.org/abs/2505.22098v1
Date: Wed, 28 May 2025 08:21:05 GMT
Title: UAVPairs: A Challenging Benchmark for Match Pair Retrieval of Large-scale UAV Images
Authors: Junhuan Liu, San Jiang, Wei Ge, Wei Huang, Bingxuan Guo, Qingquan Li,
Abstract summary: This paper contributes a benchmark dataset, UAVPairs, and a training pipeline designed for match pair retrieval of large-scale UAV images.<n>The UAVPairs dataset, comprising 21,622 high-resolution images across 30 diverse scenes, is constructed.<n>The effectiveness of the UAVPairs dataset and training pipeline is validated through comprehensive experiments on three distinct large-scale UAV datasets.
Score: 8.607887740177802
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The primary contribution of this paper is a challenging benchmark dataset, UAVPairs, and a training pipeline designed for match pair retrieval of large-scale UAV images. First, the UAVPairs dataset, comprising 21,622 high-resolution images across 30 diverse scenes, is constructed; the 3D points and tracks generated by SfM-based 3D reconstruction are employed to define the geometric similarity of image pairs, ensuring genuinely matchable image pairs are used for training. Second, to solve the problem of expensive mining cost for global hard negative mining, a batched nontrivial sample mining strategy is proposed, leveraging the geometric similarity and multi-scene structure of the UAVPairs to generate training samples as to accelerate training. Third, recognizing the limitation of pair-based losses, the ranked list loss is designed to improve the discrimination of image retrieval models, which optimizes the global similarity structure constructed from the positive set and negative set. Finally, the effectiveness of the UAVPairs dataset and training pipeline is validated through comprehensive experiments on three distinct large-scale UAV datasets. The experiment results demonstrate that models trained with the UAVPairs dataset and the ranked list loss achieve significantly improved retrieval accuracy compared to models trained on existing datasets or with conventional losses. Furthermore, these improvements translate to enhanced view graph connectivity and higher quality of reconstructed 3D models. The models trained by the proposed approach perform more robustly compared with hand-crafted global features, particularly in challenging repetitively textured scenes and weakly textured scenes. For match pair retrieval of large-scale UAV images, the trained image retrieval models offer an effective solution. The dataset would be made publicly available at https://github.com/json87/UAVPairs.

Related papers

One RL to See Them All: Visual Triple Unified Reinforcement Learning [92.90120580989839]
We propose V-Triune, a Visual Triple Unified Reinforcement Learning system that enables visual reasoning and perception tasks within a single training pipeline.<n>V-Triune comprises triple complementary components: Sample-Level Datashelf (to unify diverse task inputs), Verifier-Level Reward (to deliver custom rewards via specialized verifiers).<n>We introduce a novel Dynamic IoU reward, which provides adaptive, progressive, and definite feedback for perception tasks handled by V-Triune.
arXiv Detail & Related papers (2025-05-23T17:41:14Z)
UAVTwin: Neural Digital Twins for UAVs using Gaussian Splatting [57.63613048492219]
We present UAVTwin, a method for creating digital twins from real-world environments and facilitating data augmentation for training downstream models embedded in unmanned aerial vehicles (UAVs)<n>This is achieved by integrating 3D Gaussian Splatting (3DGS) for reconstructing backgrounds along with controllable synthetic human models that display diverse appearances and actions in multiple poses.
arXiv Detail & Related papers (2025-04-02T22:17:30Z)
Rethinking Image Super-Resolution from Training Data Perspectives [54.28824316574355]
We investigate the understudied effect of the training data used for image super-resolution (SR) With this, we propose an automated image evaluation pipeline. We find that datasets with (i) low compression artifacts, (ii) high within-image diversity as judged by the number of different objects, and (iii) a large number of images from ImageNet or PASS all positively affect SR performance.
arXiv Detail & Related papers (2024-09-01T16:25:04Z)
Cross-domain and Cross-dimension Learning for Image-to-Graph Transformers [48.74331852418905]
Direct image-to-graph transformation is a challenging task that involves solving object detection and relationship prediction in a single model.<n>Due to this task's complexity, large training datasets are rare in many domains, making the training of deep-learning methods challenging.<n>We introduce a set of methods enabling cross-domain and cross-dimension learning for image-to-graph transformers.
arXiv Detail & Related papers (2024-03-11T10:48:56Z)
An evaluation of Deep Learning based stereo dense matching dataset shift from aerial images and a large scale stereo dataset [2.048226951354646]
We present a method for generating ground-truth disparity maps directly from Light Detection and Ranging (LiDAR) and images. We evaluate 11 dense matching methods across datasets with diverse scene types, image resolutions, and geometric configurations.
arXiv Detail & Related papers (2024-02-19T20:33:46Z)
RadOcc: Learning Cross-Modality Occupancy Knowledge through Rendering Assisted Distillation [50.35403070279804]
3D occupancy prediction is an emerging task that aims to estimate the occupancy states and semantics of 3D scenes using multi-view images. We propose RadOcc, a Rendering assisted distillation paradigm for 3D Occupancy prediction.
arXiv Detail & Related papers (2023-12-19T03:39:56Z)
Efficient Match Pair Retrieval for Large-scale UAV Images via Graph Indexed Global Descriptor [9.402103660431791]
This paper proposes an efficient match pair retrieval method and implements an integrated workflow for parallel SfM reconstruction. The proposed solution has been verified using three large-scale datasets.
arXiv Detail & Related papers (2023-07-10T12:41:55Z)
Class Anchor Margin Loss for Content-Based Image Retrieval [97.81742911657497]
We propose a novel repeller-attractor loss that falls in the metric learning paradigm, yet directly optimize for the L2 metric without the need of generating pairs. We evaluate the proposed objective in the context of few-shot and full-set training on the CBIR task, by using both convolutional and transformer architectures.
arXiv Detail & Related papers (2023-06-01T12:53:10Z)
Training on Thin Air: Improve Image Classification with Generated Data [28.96941414724037]
Diffusion Inversion is a simple yet effective method to generate diverse, high-quality training data for image classification. Our approach captures the original data distribution and ensures data coverage by inverting images to the latent space of Stable Diffusion. We identify three key components that allow our generated images to successfully supplant the original dataset.
arXiv Detail & Related papers (2023-05-24T16:33:02Z)
UAVStereo: A Multiple Resolution Dataset for Stereo Matching in UAV Scenarios [0.6524460254566905]
This paper constructs a multi-resolution UAV scenario dataset, called UAVStereo, with over 34k stereo image pairs covering 3 typical scenes. In this paper, we evaluate traditional and state-of-the-art deep learning methods, highlighting their limitations in addressing challenges in UAV scenarios.
arXiv Detail & Related papers (2023-02-20T16:45:27Z)
Drone Referring Localization: An Efficient Heterogeneous Spatial Feature Interaction Method For UAV Self-Localization [22.94589565476653]
We propose an efficient heterogeneous spatial feature interaction method, termed Drone Referring localization (DRL) Unlike conventional methods that treat different data sources in isolation, DRL facilitates the learnable interaction of heterogeneous features. Compared to traditional IR methods, DRL achieves superior localization accuracy (MA@20 +9.4%) while significantly reducing computational time (1/7) and storage overhead (2/3)
arXiv Detail & Related papers (2022-08-13T03:25:50Z)
Learning Co-segmentation by Segment Swapping for Retrieval and Discovery [67.6609943904996]
The goal of this work is to efficiently identify visually similar patterns from a pair of images. We generate synthetic training pairs by selecting object segments in an image and copy-pasting them into another image. We show our approach provides clear improvements for artwork details retrieval on the Brueghel dataset.
arXiv Detail & Related papers (2021-10-29T16:51:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.