Related papers: Zero-Shot Aerial Object Detection with Visual Description Regularization

Zero-Shot Aerial Object Detection with Visual Description Regularization

URL: http://arxiv.org/abs/2402.18233v2
Date: Fri, 1 Mar 2024 10:07:15 GMT
Title: Zero-Shot Aerial Object Detection with Visual Description Regularization
Authors: Zhengqing Zang, Chenyu Lin, Chenwei Tang, Tao Wang, Jiancheng Lv
Abstract summary: We propose a zero-shot method for aerial object detection named visual Description Regularization, or DescReg. We identify the weak semantic-visual correlation of the aerial objects and aim to address the challenge with prior descriptions of their visual appearance. We conduct extensive experiments with three challenging aerial object detection datasets, including DIOR, xView, and DOTA.
Score: 15.14310599469107
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Existing object detection models are mainly trained on large-scale labeled datasets. However, annotating data for novel aerial object classes is expensive since it is time-consuming and may require expert knowledge. Thus, it is desirable to study label-efficient object detection methods on aerial images. In this work, we propose a zero-shot method for aerial object detection named visual Description Regularization, or DescReg. Concretely, we identify the weak semantic-visual correlation of the aerial objects and aim to address the challenge with prior descriptions of their visual appearance. Instead of directly encoding the descriptions into class embedding space which suffers from the representation gap problem, we propose to infuse the prior inter-class visual similarity conveyed in the descriptions into the embedding learning. The infusion process is accomplished with a newly designed similarity-aware triplet loss which incorporates structured regularization on the representation space. We conduct extensive experiments with three challenging aerial object detection datasets, including DIOR, xView, and DOTA. The results demonstrate that DescReg significantly outperforms the state-of-the-art ZSD methods with complex projection designs and generative frameworks, e.g., DescReg outperforms best reported ZSD method on DIOR by 4.5 mAP on unseen classes and 8.1 in HM. We further show the generalizability of DescReg by integrating it into generative ZSD methods as well as varying the detection architecture.

Related papers

SAR Object Detection with Self-Supervised Pretraining and Curriculum-Aware Sampling [41.24071764578782]
Object detection in satellite-borne Synthetic Aperture Radar imagery holds immense potential in tasks such as urban monitoring and disaster response. The detection of small objects in satellite-borne SAR images poses a particularly intricate problem, because of the technology's relatively low spatial resolution and inherent noise. In this paper, we introduce TRANSAR, a novel self-supervised end-to-end vision transformer-based SAR object detection model.
arXiv Detail & Related papers (2025-04-17T19:44:05Z)
ZoRI: Towards Discriminative Zero-Shot Remote Sensing Instance Segmentation [23.40908829241552]
We propose a novel task called zero-shot remote sensing instance segmentation, aimed at identifying aerial objects that are absent from training data.<n>We introduce a knowledge-injected adaptation strategy that decouples semantic-related information to preserve the pretrained vision-language alignment.<n>We establish new experimental protocols and benchmarks, and extensive experiments convincingly demonstrate that ZoRI achieves the state-of-art performance.
arXiv Detail & Related papers (2024-12-17T11:00:56Z)
InfRS: Incremental Few-Shot Object Detection in Remote Sensing Images [11.916941756499435]
In this paper, we explore the intricate task of incremental few-shot object detection in remote sensing images. We introduce a pioneering fine-tuning-based technique, termed InfRS, designed to facilitate the incremental learning of novel classes. We develop a prototypical calibration strategy based on the Wasserstein distance to mitigate the catastrophic forgetting problem.
arXiv Detail & Related papers (2024-05-18T13:39:50Z)
Few-shot Oriented Object Detection with Memorable Contrastive Learning in Remote Sensing Images [11.217630579076237]
Few-shot object detection (FSOD) has garnered significant research attention in the field of remote sensing. We propose a novel FSOD method for remote sensing images called Few-shot Oriented object detection with Memorable Contrastive learning (FOMC) Specifically, we employ oriented bounding boxes instead of traditional horizontal bounding boxes to learn a better feature representation for arbitrary-oriented aerial objects.
arXiv Detail & Related papers (2024-03-20T08:15:18Z)
Exploring Robust Features for Few-Shot Object Detection in Satellite Imagery [17.156864650143678]
We develop a few-shot object detector based on a traditional two-stage architecture. A large-scale pre-trained model is used to build class-reference embeddings or prototypes. We perform evaluations on two remote sensing datasets containing challenging and rare objects.
arXiv Detail & Related papers (2024-03-08T15:20:27Z)
Semi-supervised Open-World Object Detection [74.95267079505145]
We introduce a more realistic formulation, named semi-supervised open-world detection (SS-OWOD) We demonstrate that the performance of the state-of-the-art OWOD detector dramatically deteriorates in the proposed SS-OWOD setting. Our experiments on 4 datasets including MS COCO, PASCAL, Objects365 and DOTA demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2024-02-25T07:12:51Z)
Innovative Horizons in Aerial Imagery: LSKNet Meets DiffusionDet for Advanced Object Detection [55.2480439325792]
We present an in-depth evaluation of an object detection model that integrates the LSKNet backbone with the DiffusionDet head. The proposed model achieves a mean average precision (MAP) of approximately 45.7%, which is a significant improvement. This advancement underscores the effectiveness of the proposed modifications and sets a new benchmark in aerial image analysis.
arXiv Detail & Related papers (2023-11-21T19:49:13Z)
Transformation-Invariant Network for Few-Shot Object Detection in Remote Sensing Images [15.251042369061024]
Few-shot object detection (FSOD) relies on a large amount of labeled data for training. Scale and orientation variations of objects in remote sensing images pose significant challenges to existing FSOD methods. We propose integrating a feature pyramid network and utilizing prototype features to enhance query features.
arXiv Detail & Related papers (2023-03-13T02:21:38Z)
Knowledge Distillation for Oriented Object Detection on Aerial Images [1.827510863075184]
We present a model compression method for rotated object detection on aerial images by knowledge distillation, namely KD-RNet. The experimental result on a large-scale aerial object detection dataset (DOTA) demonstrates that the proposed KD-RNet model can achieve improved mean-average precision (mAP) with reduced number of parameters, at the same time, KD-RNet boost the performance on providing high quality detections with higher overlap with groundtruth annotations.
arXiv Detail & Related papers (2022-06-20T14:24:16Z)
Learning to Detect Instance-level Salient Objects Using Complementary Image Labels [55.049347205603304]
We present the first weakly-supervised approach to the salient instance detection problem. We propose a novel weakly-supervised network with three branches: a Saliency Detection Branch leveraging class consistency information to locate candidate objects; a Boundary Detection Branch exploiting class discrepancy information to delineate object boundaries; and a Centroid Detection Branch using subitizing information to detect salient instance centroids.
arXiv Detail & Related papers (2021-11-19T10:15:22Z)
CutPaste: Self-Supervised Learning for Anomaly Detection and Localization [59.719925639875036]
We propose a framework for building anomaly detectors using normal training data only. We first learn self-supervised deep representations and then build a generative one-class classifier on learned representations. Our empirical study on MVTec anomaly detection dataset demonstrates the proposed algorithm is general to be able to detect various types of real-world defects.
arXiv Detail & Related papers (2021-04-08T19:04:55Z)
Synthesizing the Unseen for Zero-shot Object Detection [72.38031440014463]
We propose to synthesize visual features for unseen classes, so that the model learns both seen and unseen objects in the visual domain. We use a novel generative model that uses class-semantics to not only generate the features but also to discriminatively separate them.
arXiv Detail & Related papers (2020-10-19T12:36:11Z)
EHSOD: CAM-Guided End-to-end Hybrid-Supervised Object Detection with Cascade Refinement [53.69674636044927]
We present EHSOD, an end-to-end hybrid-supervised object detection system. It can be trained in one shot on both fully and weakly-annotated data. It achieves comparable results on multiple object detection benchmarks with only 30% fully-annotated data.
arXiv Detail & Related papers (2020-02-18T08:04:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.