Re-Scoring Using Image-Language Similarity for Few-Shot Object Detection
- URL: http://arxiv.org/abs/2311.00278v1
- Date: Wed, 1 Nov 2023 04:04:34 GMT
- Title: Re-Scoring Using Image-Language Similarity for Few-Shot Object Detection
- Authors: Min Jae Jung, Seung Dae Han and Joohee Kim
- Abstract summary: Few-shot object detection, which focuses on detecting novel objects with few labels, is an emerging challenge in the community.
Recent studies show that adapting a pre-trained model or modified loss function can improve performance.
We propose Re-scoring using Image-language Similarity for Few-shot object detection (RISF) which extends Faster R-CNN.
- Score: 4.0208298639821525
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Few-shot object detection, which focuses on detecting novel objects with few
labels, is an emerging challenge in the community. Recent studies show that
adapting a pre-trained model or modified loss function can improve performance.
In this paper, we explore leveraging the power of Contrastive Language-Image
Pre-training (CLIP) and hard negative classification loss in low data setting.
Specifically, we propose Re-scoring using Image-language Similarity for
Few-shot object detection (RISF) which extends Faster R-CNN by introducing
Calibration Module using CLIP (CM-CLIP) and Background Negative Re-scale Loss
(BNRL). The former adapts CLIP, which performs zero-shot classification, to
re-score the classification scores of a detector using image-class
similarities, the latter is modified classification loss considering the
punishment for fake backgrounds as well as confusing categories on a
generalized few-shot object detection dataset. Extensive experiments on MS-COCO
and PASCAL VOC show that the proposed RISF substantially outperforms the
state-of-the-art approaches. The code will be available.
Related papers
- RAFIC: Retrieval-Augmented Few-shot Image Classification [0.0]
Few-shot image classification is the task of classifying unseen images to one of N mutually exclusive classes.
We have developed a method for augmenting the set of K with an addition set of A retrieved images.
We demonstrate that RAFIC markedly improves performance of few-shot image classification across two challenging datasets.
arXiv Detail & Related papers (2023-12-11T22:28:51Z) - Spuriosity Rankings for Free: A Simple Framework for Last Layer
Retraining Based on Object Detection [5.199218657137718]
We propose a novel ranking framework to identify images without spurious cues.
We use the object detector as a measure to score the presence of the target object in the images.
Next, the images are sorted based on this score, and the last-layer of the model is retrained on a subset of the data with the highest scores.
arXiv Detail & Related papers (2023-10-31T18:44:03Z) - Zero-Shot Visual Classification with Guided Cropping [9.321383320998262]
We propose an off-the-shelf zero-shot object detection model in a preprocessing step to increase focus of zero-shot classifier to the object of interest.
We empirically show that our approach improves zero-shot classification results across architectures and datasets, favorably for small objects.
arXiv Detail & Related papers (2023-09-12T20:09:12Z) - Image-free Classifier Injection for Zero-Shot Classification [72.66409483088995]
Zero-shot learning models achieve remarkable results on image classification for samples from classes that were not seen during training.
We aim to equip pre-trained models with zero-shot classification capabilities without the use of image data.
We achieve this with our proposed Image-free Injection with Semantics (ICIS)
arXiv Detail & Related papers (2023-08-21T09:56:48Z) - Prefix Conditioning Unifies Language and Label Supervision [84.11127588805138]
We show that dataset biases negatively affect pre-training by reducing the generalizability of learned representations.
In experiments, we show that this simple technique improves the performance in zero-shot image recognition accuracy and robustness to the image-level distribution shift.
arXiv Detail & Related papers (2022-06-02T16:12:26Z) - Incremental-DETR: Incremental Few-Shot Object Detection via
Self-Supervised Learning [60.64535309016623]
We propose the Incremental-DETR that does incremental few-shot object detection via fine-tuning and self-supervised learning on the DETR object detector.
To alleviate severe over-fitting with few novel class data, we first fine-tune the class-specific components of DETR with self-supervision.
We further introduce a incremental few-shot fine-tuning strategy with knowledge distillation on the class-specific components of DETR to encourage the network in detecting novel classes without catastrophic forgetting.
arXiv Detail & Related papers (2022-05-09T05:08:08Z) - No Token Left Behind: Explainability-Aided Image Classification and
Generation [79.4957965474334]
We present a novel explainability-based approach, which adds a loss term to ensure that CLIP focuses on all relevant semantic parts of the input.
Our method yields an improvement in the recognition rate, without additional training or fine-tuning.
arXiv Detail & Related papers (2022-04-11T07:16:39Z) - Experience feedback using Representation Learning for Few-Shot Object
Detection on Aerial Images [2.8560476609689185]
The performance of our method is assessed on DOTA, a large-scale remote sensing images dataset.
It highlights in particular some intrinsic weaknesses for the few-shot object detection task.
arXiv Detail & Related papers (2021-09-27T13:04:53Z) - Rectifying the Shortcut Learning of Background: Shared Object
Concentration for Few-Shot Image Recognition [101.59989523028264]
Few-Shot image classification aims to utilize pretrained knowledge learned from a large-scale dataset to tackle a series of downstream classification tasks.
We propose COSOC, a novel Few-Shot Learning framework, to automatically figure out foreground objects at both pretraining and evaluation stage.
arXiv Detail & Related papers (2021-07-16T07:46:41Z) - Meta Faster R-CNN: Towards Accurate Few-Shot Object Detection with
Attentive Feature Alignment [33.446875089255876]
Few-shot object detection (FSOD) aims to detect objects using only few examples.
We propose a meta-learning based few-shot object detection method by transferring meta-knowledge learned from data-abundant base classes to data-scarce novel classes.
arXiv Detail & Related papers (2021-04-15T19:01:27Z) - One-Shot Object Detection without Fine-Tuning [62.39210447209698]
We introduce a two-stage model consisting of a first stage Matching-FCOS network and a second stage Structure-Aware Relation Module.
We also propose novel training strategies that effectively improve detection performance.
Our method exceeds the state-of-the-art one-shot performance consistently on multiple datasets.
arXiv Detail & Related papers (2020-05-08T01:59:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.