Related papers: Fine-Grained Zero-Shot Object Detection

Fine-Grained Zero-Shot Object Detection

URL: http://arxiv.org/abs/2507.10358v1
Date: Mon, 14 Jul 2025 15:00:00 GMT
Title: Fine-Grained Zero-Shot Object Detection
Authors: Hongxu Ma, Chenbo Zhang, Lu Zhang, Jiaogen Zhou, Jihong Guan, Shuigeng Zhou,
Abstract summary: Zero-shot object detection (ZSD) aims to leverage semantic descriptions to localize and recognize objects of both seen and unseen classes.<n>Existing ZSD works are mainly coarse-grained object detection, where the classes are visually quite different.<n>In this paper, we propose and solve a new problem called Fine-Grained Zero-Shot Object Detection (FG-ZSD for short)
Score: 26.23374306473445
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Zero-shot object detection (ZSD) aims to leverage semantic descriptions to localize and recognize objects of both seen and unseen classes. Existing ZSD works are mainly coarse-grained object detection, where the classes are visually quite different, thus are relatively easy to distinguish. However, in real life we often have to face fine-grained object detection scenarios, where the classes are too similar to be easily distinguished. For example, detecting different kinds of birds, fishes, and flowers. In this paper, we propose and solve a new problem called Fine-Grained Zero-Shot Object Detection (FG-ZSD for short), which aims to detect objects of different classes with minute differences in details under the ZSD paradigm. We develop an effective method called MSHC for the FG-ZSD task, which is based on an improved two-stage detector and employs a multi-level semantics-aware embedding alignment loss, ensuring tight coupling between the visual and semantic spaces. Considering that existing ZSD datasets are not suitable for the new FG-ZSD task, we build the first FG-ZSD benchmark dataset FGZSD-Birds, which contains 148,820 images falling into 36 orders, 140 families, 579 genera and 1432 species. Extensive experiments on FGZSD-Birds show that our method outperforms existing ZSD models.

Related papers

ZeroSCD: Zero-Shot Street Scene Change Detection [2.3020018305241337]
Scene Change Detection is a challenging task in computer vision and robotics. Traditional change detection methods rely on training models that take these image pairs as input and estimate the changes. We propose ZeroSCD, a zero-shot scene change detection framework that eliminates the need for training.
arXiv Detail & Related papers (2024-09-23T17:53:44Z)
Zero-Shot Aerial Object Detection with Visual Description Regularization [15.14310599469107]
We propose a zero-shot method for aerial object detection named visual Description Regularization, or DescReg. We identify the weak semantic-visual correlation of the aerial objects and aim to address the challenge with prior descriptions of their visual appearance. We conduct extensive experiments with three challenging aerial object detection datasets, including DIOR, xView, and DOTA.
arXiv Detail & Related papers (2024-02-28T10:58:01Z)
Cross-Domain Few-Shot Object Detection via Enhanced Open-Set Object Detector [72.05791402494727]
This paper studies the challenging cross-domain few-shot object detection (CD-FSOD) It aims to develop an accurate object detector for novel domains with minimal labeled examples.
arXiv Detail & Related papers (2024-02-05T15:25:32Z)
Meta-ZSDETR: Zero-shot DETR with Meta-learning [29.58827207505671]
We present the first method that combines DETR and meta-learning to perform zero-shot object detection, named Meta-ZSDETR. The model is optimized with meta-contrastive learning, which contains a regression head to generate the coordinates of class-specific boxes. Experimental results show that our method outperforms the existing ZSD methods by a large margin.
arXiv Detail & Related papers (2023-08-18T13:17:07Z)
SOOD: Towards Semi-Supervised Oriented Object Detection [57.05141794402972]
This paper proposes a novel Semi-supervised Oriented Object Detection model, termed SOOD, built upon the mainstream pseudo-labeling framework. Our experiments show that when trained with the two proposed losses, SOOD surpasses the state-of-the-art SSOD methods under various settings on the DOTA-v1.5 benchmark.
arXiv Detail & Related papers (2023-04-10T11:10:42Z)
Learning Common Rationale to Improve Self-Supervised Representation for Fine-Grained Visual Recognition Problems [61.11799513362704]
We propose learning an additional screening mechanism to identify discriminative clues commonly seen across instances and classes. We show that a common rationale detector can be learned by simply exploiting the GradCAM induced from the SSL objective.
arXiv Detail & Related papers (2023-03-03T02:07:40Z)
SupeRGB-D: Zero-shot Instance Segmentation in Cluttered Indoor Environments [67.34330257205525]
In this work, we explore zero-shot instance segmentation (ZSIS) from RGB-D data to identify unseen objects in a semantic category-agnostic manner. We present a method that uses annotated objects to learn the objectness'' of pixels and generalize to unseen object categories in cluttered indoor environments.
arXiv Detail & Related papers (2022-12-22T17:59:48Z)
Resolving Semantic Confusions for Improved Zero-Shot Detection [6.72910827751713]
We propose a generative model incorporating a triplet loss that acknowledges the degree of dissimilarity between classes. A cyclic-consistency loss is also enforced to ensure that generated visual samples of a class highly correspond to their own semantics.
arXiv Detail & Related papers (2022-12-12T18:11:48Z)
A Survey of Deep Learning for Low-Shot Object Detection [44.20187548691372]
Low-Shot Object Detection (LSOD) is an emerging research topic of detecting objects from a few or even no annotated samples. This survey provides a comprehensive review of LSOD methods.
arXiv Detail & Related papers (2021-12-06T06:56:00Z)
Semantics-Guided Contrastive Network for Zero-Shot Object detection [67.61512036994458]
Zero-shot object detection (ZSD) is a new challenge in computer vision. We develop ContrastZSD, a framework that brings contrastive learning mechanism into the realm of zero-shot detection. Our method outperforms the previous state-of-the-art on both ZSD and generalized ZSD tasks.
arXiv Detail & Related papers (2021-09-04T03:32:15Z)
Synthesizing the Unseen for Zero-shot Object Detection [72.38031440014463]
We propose to synthesize visual features for unseen classes, so that the model learns both seen and unseen objects in the visual domain. We use a novel generative model that uses class-semantics to not only generate the features but also to discriminatively separate them.
arXiv Detail & Related papers (2020-10-19T12:36:11Z)
Stance Detection Benchmark: How Robust Is Your Stance Detection? [65.91772010586605]
Stance Detection (StD) aims to detect an author's stance towards a certain topic or claim. We introduce a StD benchmark that learns from ten StD datasets of various domains in a multi-dataset learning setting. Within this benchmark setup, we are able to present new state-of-the-art results on five of the datasets.
arXiv Detail & Related papers (2020-01-06T13:37:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.