Generating Features with Increased Crop-related Diversity for Few-Shot
Object Detection
- URL: http://arxiv.org/abs/2304.05096v1
- Date: Tue, 11 Apr 2023 09:47:21 GMT
- Title: Generating Features with Increased Crop-related Diversity for Few-Shot
Object Detection
- Authors: Jingyi Xu, Hieu Le, Dimitris Samaras
- Abstract summary: Two-stage object detectors generate object proposals and classify them to detect objects in images.
Proposals often do not contain the objects perfectly but overlap with them in many possible ways.
We propose a novel variational autoencoder based data generation model, which is capable of generating data with increased crop-related diversity.
- Score: 35.652092907690694
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Two-stage object detectors generate object proposals and classify them to
detect objects in images. These proposals often do not contain the objects
perfectly but overlap with them in many possible ways, exhibiting great
variability in the difficulty levels of the proposals. Training a robust
classifier against this crop-related variability requires abundant training
data, which is not available in few-shot settings. To mitigate this issue, we
propose a novel variational autoencoder (VAE) based data generation model,
which is capable of generating data with increased crop-related diversity. The
main idea is to transform the latent space such latent codes with different
norms represent different crop-related variations. This allows us to generate
features with increased crop-related diversity in difficulty levels by simply
varying the latent norm. In particular, each latent code is rescaled such that
its norm linearly correlates with the IoU score of the input crop w.r.t. the
ground-truth box. Here the IoU score is a proxy that represents the difficulty
level of the crop. We train this VAE model on base classes conditioned on the
semantic code of each class and then use the trained model to generate features
for novel classes. In our experiments our generated features consistently
improve state-of-the-art few-shot object detection methods on the PASCAL VOC
and MS COCO datasets.
Related papers
- CamDiff: Camouflage Image Augmentation via Diffusion Model [83.35960536063857]
CamDiff is a novel approach to synthesize salient objects in camouflaged scenes.
We leverage the latent diffusion model to synthesize salient objects in camouflaged scenes.
Our approach enables flexible editing and efficient large-scale dataset generation at a low cost.
arXiv Detail & Related papers (2023-04-11T19:37:47Z) - MixTeacher: Mining Promising Labels with Mixed Scale Teacher for
Semi-Supervised Object Detection [22.047246997864143]
Scale variation across object instances remains a key challenge in object detection task.
We propose a novel framework that addresses the scale variation problem by introducing a mixed scale teacher.
Our experiments on MS COCO and PASCAL VOC benchmarks under various semi-supervised settings demonstrate that our method achieves new state-of-the-art performance.
arXiv Detail & Related papers (2023-03-16T03:37:54Z) - Intra-class Adaptive Augmentation with Neighbor Correction for Deep
Metric Learning [99.14132861655223]
We propose a novel intra-class adaptive augmentation (IAA) framework for deep metric learning.
We reasonably estimate intra-class variations for every class and generate adaptive synthetic samples to support hard samples mining.
Our method significantly improves and outperforms the state-of-the-art methods on retrieval performances by 3%-6%.
arXiv Detail & Related papers (2022-11-29T14:52:38Z) - Mitigating Generation Shifts for Generalized Zero-Shot Learning [52.98182124310114]
Generalized Zero-Shot Learning (GZSL) is the task of leveraging semantic information (e.g., attributes) to recognize the seen and unseen samples, where unseen classes are not observable during training.
We propose a novel Generation Shifts Mitigating Flow framework for learning unseen data synthesis efficiently and effectively.
Experimental results demonstrate that GSMFlow achieves state-of-the-art recognition performance in both conventional and generalized zero-shot settings.
arXiv Detail & Related papers (2021-07-07T11:43:59Z) - Balancing Constraints and Submodularity in Data Subset Selection [43.03720397062461]
We show that one can achieve similar accuracy to traditional deep-learning models, while using less training data.
We propose a novel diversity driven objective function, and balancing constraints on class labels and decision boundaries using matroids.
arXiv Detail & Related papers (2021-04-26T19:22:27Z) - Exploring Complementary Strengths of Invariant and Equivariant
Representations for Few-Shot Learning [96.75889543560497]
In many real-world problems, collecting a large number of labeled samples is infeasible.
Few-shot learning is the dominant approach to address this issue, where the objective is to quickly adapt to novel categories in presence of a limited number of samples.
We propose a novel training mechanism that simultaneously enforces equivariance and invariance to a general set of geometric transformations.
arXiv Detail & Related papers (2021-03-01T21:14:33Z) - Multi-scale Interactive Network for Salient Object Detection [91.43066633305662]
We propose the aggregate interaction modules to integrate the features from adjacent levels.
To obtain more efficient multi-scale features, the self-interaction modules are embedded in each decoder unit.
Experimental results on five benchmark datasets demonstrate that the proposed method without any post-processing performs favorably against 23 state-of-the-art approaches.
arXiv Detail & Related papers (2020-07-17T15:41:37Z) - UniT: Unified Knowledge Transfer for Any-shot Object Detection and
Segmentation [52.487469544343305]
Methods for object detection and segmentation rely on large scale instance-level annotations for training.
We propose an intuitive and unified semi-supervised model that is applicable to a range of supervision.
arXiv Detail & Related papers (2020-06-12T22:45:47Z) - Variational Mutual Information Maximization Framework for VAE Latent
Codes with Continuous and Discrete Priors [5.317548969642376]
Variational Autoencoder (VAE) is a scalable method for learning directed latent variable models of complex data.
We propose Variational Mutual Information Maximization Framework for VAE to address this issue.
arXiv Detail & Related papers (2020-06-02T09:05:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.