Semantic Enhanced Few-shot Object Detection
- URL: http://arxiv.org/abs/2406.13498v1
- Date: Wed, 19 Jun 2024 12:40:55 GMT
- Title: Semantic Enhanced Few-shot Object Detection
- Authors: Zheng Wang, Yingjie Gao, Qingjie Liu, Yunhong Wang,
- Abstract summary: We propose a fine-tuning based FSOD framework that utilizes semantic embeddings for better detection.
Our method allows each novel class to construct a compact feature space without being confused with similar base classes.
- Score: 37.715912401900745
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Few-shot object detection~(FSOD), which aims to detect novel objects with limited annotated instances, has made significant progress in recent years. However, existing methods still suffer from biased representations, especially for novel classes in extremely low-shot scenarios. During fine-tuning, a novel class may exploit knowledge from similar base classes to construct its own feature distribution, leading to classification confusion and performance degradation. To address these challenges, we propose a fine-tuning based FSOD framework that utilizes semantic embeddings for better detection. In our proposed method, we align the visual features with class name embeddings and replace the linear classifier with our semantic similarity classifier. Our method trains each region proposal to converge to the corresponding class embedding. Furthermore, we introduce a multimodal feature fusion to augment the vision-language communication, enabling a novel class to draw support explicitly from well-trained similar base classes. To prevent class confusion, we propose a semantic-aware max-margin loss, which adaptively applies a margin beyond similar classes. As a result, our method allows each novel class to construct a compact feature space without being confused with similar base classes. Extensive experiments on Pascal VOC and MS COCO demonstrate the superiority of our method.
Related papers
- Open-Vocabulary Object Detection with Meta Prompt Representation and Instance Contrastive Optimization [63.66349334291372]
We propose a framework with Meta prompt and Instance Contrastive learning (MIC) schemes.
Firstly, we simulate a novel-class-emerging scenario to help the prompt that learns class and background prompts generalize to novel classes.
Secondly, we design an instance-level contrastive strategy to promote intra-class compactness and inter-class separation, which benefits generalization of the detector to novel class objects.
arXiv Detail & Related papers (2024-03-14T14:25:10Z) - ProxyDet: Synthesizing Proxy Novel Classes via Classwise Mixup for
Open-Vocabulary Object Detection [7.122652901894367]
Open-vocabulary object detection (OVOD) aims to recognize novel objects whose categories are not included in the training set.
We present a novel, yet simple technique that helps generalization on the overall distribution of novel classes.
arXiv Detail & Related papers (2023-12-12T13:45:56Z) - Few-Shot Class-Incremental Learning via Training-Free Prototype
Calibration [67.69532794049445]
We find a tendency for existing methods to misclassify the samples of new classes into base classes, which leads to the poor performance of new classes.
We propose a simple yet effective Training-frEE calibratioN (TEEN) strategy to enhance the discriminability of new classes.
arXiv Detail & Related papers (2023-12-08T18:24:08Z) - Harmonizing Base and Novel Classes: A Class-Contrastive Approach for
Generalized Few-Shot Segmentation [78.74340676536441]
We propose a class contrastive loss and a class relationship loss to regulate prototype updates and encourage a large distance between prototypes.
Our proposed approach achieves new state-of-the-art performance for the generalized few-shot segmentation task on PASCAL VOC and MS COCO datasets.
arXiv Detail & Related papers (2023-03-24T00:30:25Z) - Few-Shot Object Detection via Variational Feature Aggregation [32.34871873486389]
We propose a meta-learning framework with two novel feature aggregation schemes.
We first present a Class-Agnostic Aggregation (CAA) method, where the query and support features can be aggregated regardless of their categories.
We then propose a Variational Feature Aggregation (VFA) method, which encodes support examples into class-level support features.
arXiv Detail & Related papers (2023-01-31T04:58:21Z) - Incremental Few-Shot Learning via Implanting and Compressing [13.122771115838523]
Incremental Few-Shot Learning requires a model to continually learn novel classes from only a few examples.
We propose a two-step learning strategy referred to as textbfImplanting and textbfCompressing.
Specifically, in the textbfImplanting step, we propose to mimic the data distribution of novel classes with the assistance of data-abundant base set.
In the textbf step, we adapt the feature extractor to precisely represent each novel class for enhancing intra-class compactness.
arXiv Detail & Related papers (2022-03-19T11:04:43Z) - Few-Shot Object Detection via Association and DIscrimination [83.8472428718097]
Few-shot object detection via Association and DIscrimination builds up a discriminative feature space for each novel class with two integral steps.
Experiments on Pascal VOC and MS-COCO datasets demonstrate FADI achieves new SOTA performance, significantly improving the baseline in any shot/split by +18.7.
arXiv Detail & Related papers (2021-11-23T05:04:06Z) - Anti-aliasing Semantic Reconstruction for Few-Shot Semantic Segmentation [66.85202434812942]
We reformulate few-shot segmentation as a semantic reconstruction problem.
We convert base class features into a series of basis vectors which span a class-level semantic space for novel class reconstruction.
Our proposed approach, referred to as anti-aliasing semantic reconstruction (ASR), provides a systematic yet interpretable solution for few-shot learning problems.
arXiv Detail & Related papers (2021-06-01T02:17:36Z) - Revisiting Deep Local Descriptor for Improved Few-Shot Classification [56.74552164206737]
We show how one can improve the quality of embeddings by leveraging textbfDense textbfClassification and textbfAttentive textbfPooling.
We suggest to pool feature maps by applying attentive pooling instead of the widely used global average pooling (GAP) to prepare embeddings for few-shot classification.
arXiv Detail & Related papers (2021-03-30T00:48:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.