Few-Shot Object Detection by Knowledge Distillation Using
Bag-of-Visual-Words Representations
- URL: http://arxiv.org/abs/2207.12049v1
- Date: Mon, 25 Jul 2022 10:40:40 GMT
- Title: Few-Shot Object Detection by Knowledge Distillation Using
Bag-of-Visual-Words Representations
- Authors: Wenjie Pei, Shuang Wu, Dianwen Mei, Fanglin Chen, Jiandong Tian,
Guangming Lu
- Abstract summary: We design a novel knowledge distillation framework to guide the learning of the object detector.
We first present a novel Position-Aware Bag-of-Visual-Words model for learning a representative bag of visual words.
We then perform knowledge distillation based on the fact that an image should have consistent BoVW representations in two different feature spaces.
- Score: 58.48995335728938
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While fine-tuning based methods for few-shot object detection have achieved
remarkable progress, a crucial challenge that has not been addressed well is
the potential class-specific overfitting on base classes and sample-specific
overfitting on novel classes. In this work we design a novel knowledge
distillation framework to guide the learning of the object detector and thereby
restrain the overfitting in both the pre-training stage on base classes and
fine-tuning stage on novel classes. To be specific, we first present a novel
Position-Aware Bag-of-Visual-Words model for learning a representative bag of
visual words (BoVW) from a limited size of image set, which is used to encode
general images based on the similarities between the learned visual words and
an image. Then we perform knowledge distillation based on the fact that an
image should have consistent BoVW representations in two different feature
spaces. To this end, we pre-learn a feature space independently from the object
detection, and encode images using BoVW in this space. The obtained BoVW
representation for an image can be considered as distilled knowledge to guide
the learning of object detector: the extracted features by the object detector
for the same image are expected to derive the consistent BoVW representations
with the distilled knowledge. Extensive experiments validate the effectiveness
of our method and demonstrate the superiority over other state-of-the-art
methods.
Related papers
- Context-driven Visual Object Recognition based on Knowledge Graphs [0.8701566919381223]
We propose an approach that enhances deep learning methods by using external contextual knowledge encoded in a knowledge graph.
We conduct a series of experiments to investigate the impact of different contextual views on the learned object representations for the same image dataset.
arXiv Detail & Related papers (2022-10-20T13:09:00Z) - LEAD: Self-Supervised Landmark Estimation by Aligning Distributions of
Feature Similarity [49.84167231111667]
Existing works in self-supervised landmark detection are based on learning dense (pixel-level) feature representations from an image.
We introduce an approach to enhance the learning of dense equivariant representations in a self-supervised fashion.
We show that having such a prior in the feature extractor helps in landmark detection, even under drastically limited number of annotations.
arXiv Detail & Related papers (2022-04-06T17:48:18Z) - Rectifying the Shortcut Learning of Background: Shared Object
Concentration for Few-Shot Image Recognition [101.59989523028264]
Few-Shot image classification aims to utilize pretrained knowledge learned from a large-scale dataset to tackle a series of downstream classification tasks.
We propose COSOC, a novel Few-Shot Learning framework, to automatically figure out foreground objects at both pretraining and evaluation stage.
arXiv Detail & Related papers (2021-07-16T07:46:41Z) - Knowledge-Guided Object Discovery with Acquired Deep Impressions [41.07379505694274]
We present a framework called Acquired Deep Impressions (ADI) which continuously learns knowledge of objects as "impressions"
ADI first acquires knowledge from scene images containing a single object in a supervised manner.
It then learns from novel multi-object scene images which may contain objects that have not been seen before.
arXiv Detail & Related papers (2021-03-19T03:17:57Z) - Online Bag-of-Visual-Words Generation for Unsupervised Representation
Learning [59.29452780994169]
We propose a teacher-student scheme to learn representations by training a convnet to reconstruct a bag-of-visual-words (BoW) representation of an image.
Our strategy performs an online training of both the teacher network (whose role is to generate the BoW targets) and the student network (whose role is to learn representations) along with an online update of the visual-words vocabulary.
arXiv Detail & Related papers (2020-12-21T18:31:21Z) - Distilling Localization for Self-Supervised Representation Learning [82.79808902674282]
Contrastive learning has revolutionized unsupervised representation learning.
Current contrastive models are ineffective at localizing the foreground object.
We propose a data-driven approach for learning in variance to backgrounds.
arXiv Detail & Related papers (2020-04-14T16:29:42Z) - Learning Representations by Predicting Bags of Visual Words [55.332200948110895]
Self-supervised representation learning targets to learn convnet-based image representations from unlabeled data.
Inspired by the success of NLP methods in this area, in this work we propose a self-supervised approach based on spatially dense image descriptions.
arXiv Detail & Related papers (2020-02-27T16:45:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.