Re-Scoring Using Image-Language Similarity for Few-Shot Object Detection
- URL: http://arxiv.org/abs/2311.00278v1
- Date: Wed, 1 Nov 2023 04:04:34 GMT
- Title: Re-Scoring Using Image-Language Similarity for Few-Shot Object Detection
- Authors: Min Jae Jung, Seung Dae Han and Joohee Kim
- Abstract summary: Few-shot object detection, which focuses on detecting novel objects with few labels, is an emerging challenge in the community.
Recent studies show that adapting a pre-trained model or modified loss function can improve performance.
We propose Re-scoring using Image-language Similarity for Few-shot object detection (RISF) which extends Faster R-CNN.
- Score: 4.0208298639821525
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Few-shot object detection, which focuses on detecting novel objects with few
labels, is an emerging challenge in the community. Recent studies show that
adapting a pre-trained model or modified loss function can improve performance.
In this paper, we explore leveraging the power of Contrastive Language-Image
Pre-training (CLIP) and hard negative classification loss in low data setting.
Specifically, we propose Re-scoring using Image-language Similarity for
Few-shot object detection (RISF) which extends Faster R-CNN by introducing
Calibration Module using CLIP (CM-CLIP) and Background Negative Re-scale Loss
(BNRL). The former adapts CLIP, which performs zero-shot classification, to
re-score the classification scores of a detector using image-class
similarities, the latter is modified classification loss considering the
punishment for fake backgrounds as well as confusing categories on a
generalized few-shot object detection dataset. Extensive experiments on MS-COCO
and PASCAL VOC show that the proposed RISF substantially outperforms the
state-of-the-art approaches. The code will be available.
Related papers
- CSPCL: Category Semantic Prior Contrastive Learning for Deformable DETR-Based Prohibited Item Detectors [8.23801404004195]
Prohibited item detection based on X-ray images is one of the most effective security inspection methods.
foreground-background feature coupling makes general detectors designed for natural images perform poorly.
We propose a Category Semantic Prior Contrastive Learning mechanism to align the class prototypes perceived by the classifier with the content queries.
arXiv Detail & Related papers (2025-01-28T03:04:22Z) - Few-shot Algorithm Assurance [11.924406021826606]
deep learning models are vulnerable to image distortion.
Model Assurance under Image Distortion is a classification task.
We propose a novel Conditional Level Set Estimation algorithm.
arXiv Detail & Related papers (2024-12-28T21:11:55Z) - CLIP-FSAC++: Few-Shot Anomaly Classification with Anomaly Descriptor Based on CLIP [22.850815902535988]
We propose an effective few-shot anomaly classification framework with one-stage training, dubbed CLIP-FSAC++.
In anomaly descriptor, image-to-text cross-attention module is used to obtain image-specific text embeddings.
Comprehensive experiment results are provided for evaluating our method in few-normal shot anomaly classification on VisA and MVTEC-AD for 1, 2, 4 and 8-shot settings.
arXiv Detail & Related papers (2024-12-05T02:44:45Z) - Multi-Level Correlation Network For Few-Shot Image Classification [36.44416763952161]
Few-shot image classification aims to recognize novel classes given few labeled images from base classes.
We propose a multi-level correlation network (MLCN) for FSIC to tackle this problem by effectively capturing local information.
arXiv Detail & Related papers (2024-12-04T09:36:24Z) - RAFIC: Retrieval-Augmented Few-shot Image Classification [0.0]
Few-shot image classification is the task of classifying unseen images to one of N mutually exclusive classes.
We have developed a method for augmenting the set of K with an addition set of A retrieved images.
We demonstrate that RAFIC markedly improves performance of few-shot image classification across two challenging datasets.
arXiv Detail & Related papers (2023-12-11T22:28:51Z) - Image-free Classifier Injection for Zero-Shot Classification [72.66409483088995]
Zero-shot learning models achieve remarkable results on image classification for samples from classes that were not seen during training.
We aim to equip pre-trained models with zero-shot classification capabilities without the use of image data.
We achieve this with our proposed Image-free Injection with Semantics (ICIS)
arXiv Detail & Related papers (2023-08-21T09:56:48Z) - Prefix Conditioning Unifies Language and Label Supervision [84.11127588805138]
We show that dataset biases negatively affect pre-training by reducing the generalizability of learned representations.
In experiments, we show that this simple technique improves the performance in zero-shot image recognition accuracy and robustness to the image-level distribution shift.
arXiv Detail & Related papers (2022-06-02T16:12:26Z) - Incremental-DETR: Incremental Few-Shot Object Detection via
Self-Supervised Learning [60.64535309016623]
We propose the Incremental-DETR that does incremental few-shot object detection via fine-tuning and self-supervised learning on the DETR object detector.
To alleviate severe over-fitting with few novel class data, we first fine-tune the class-specific components of DETR with self-supervision.
We further introduce a incremental few-shot fine-tuning strategy with knowledge distillation on the class-specific components of DETR to encourage the network in detecting novel classes without catastrophic forgetting.
arXiv Detail & Related papers (2022-05-09T05:08:08Z) - No Token Left Behind: Explainability-Aided Image Classification and
Generation [79.4957965474334]
We present a novel explainability-based approach, which adds a loss term to ensure that CLIP focuses on all relevant semantic parts of the input.
Our method yields an improvement in the recognition rate, without additional training or fine-tuning.
arXiv Detail & Related papers (2022-04-11T07:16:39Z) - Rectifying the Shortcut Learning of Background: Shared Object
Concentration for Few-Shot Image Recognition [101.59989523028264]
Few-Shot image classification aims to utilize pretrained knowledge learned from a large-scale dataset to tackle a series of downstream classification tasks.
We propose COSOC, a novel Few-Shot Learning framework, to automatically figure out foreground objects at both pretraining and evaluation stage.
arXiv Detail & Related papers (2021-07-16T07:46:41Z) - One-Shot Object Detection without Fine-Tuning [62.39210447209698]
We introduce a two-stage model consisting of a first stage Matching-FCOS network and a second stage Structure-Aware Relation Module.
We also propose novel training strategies that effectively improve detection performance.
Our method exceeds the state-of-the-art one-shot performance consistently on multiple datasets.
arXiv Detail & Related papers (2020-05-08T01:59:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.