Image-Level or Object-Level? A Tale of Two Resampling Strategies for
Long-Tailed Detection
- URL: http://arxiv.org/abs/2104.05702v1
- Date: Mon, 12 Apr 2021 17:58:30 GMT
- Title: Image-Level or Object-Level? A Tale of Two Resampling Strategies for
Long-Tailed Detection
- Authors: Nadine Chang, Zhiding Yu, Yu-Xiong Wang, Anima Anandkumar, Sanja
Fidler, Jose M. Alvarez
- Abstract summary: We show that long-tailed detection differs from classification since multiple classes may be present in one image.
We introduce an object-centric memory replay strategy based on dynamic, episodic memory banks.
Our method outperforms state-of-the-art long-tailed detection and segmentation methods on LVIS v0.5 across various backbones.
- Score: 114.00301664929911
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Training on datasets with long-tailed distributions has been challenging for
major recognition tasks such as classification and detection. To deal with this
challenge, image resampling is typically introduced as a simple but effective
approach. However, we observe that long-tailed detection differs from
classification since multiple classes may be present in one image. As a result,
image resampling alone is not enough to yield a sufficiently balanced
distribution at the object level. We address object-level resampling by
introducing an object-centric memory replay strategy based on dynamic, episodic
memory banks. Our proposed strategy has two benefits: 1) convenient
object-level resampling without significant extra computation, and 2) implicit
feature-level augmentation from model updates. We show that image-level and
object-level resamplings are both important, and thus unify them with a joint
resampling strategy (RIO). Our method outperforms state-of-the-art long-tailed
detection and segmentation methods on LVIS v0.5 across various backbones.
Related papers
- Learning from Rich Semantics and Coarse Locations for Long-tailed Object
Detection [157.18560601328534]
RichSem is a robust method to learn rich semantics from coarse locations without the need of accurate bounding boxes.
We add a semantic branch to our detector to learn these soft semantics and enhance feature representations for long-tailed object detection.
Our method achieves state-of-the-art performance without requiring complex training and testing procedures.
arXiv Detail & Related papers (2023-10-18T17:59:41Z) - Improving Human-Object Interaction Detection via Virtual Image Learning [68.56682347374422]
Human-Object Interaction (HOI) detection aims to understand the interactions between humans and objects.
In this paper, we propose to alleviate the impact of such an unbalanced distribution via Virtual Image Leaning (VIL)
A novel label-to-image approach, Multiple Steps Image Creation (MUSIC), is proposed to create a high-quality dataset that has a consistent distribution with real images.
arXiv Detail & Related papers (2023-08-04T10:28:48Z) - A Contrastive Distillation Approach for Incremental Semantic
Segmentation in Aerial Images [15.75291664088815]
A major issue concerning current deep neural architectures is known as catastrophic forgetting.
We propose a contrastive regularization, where any given input is compared with its augmented version.
We show the effectiveness of our solution on the Potsdam dataset, outperforming the incremental baseline in every test.
arXiv Detail & Related papers (2021-12-07T16:44:45Z) - Contrastive Object-level Pre-training with Spatial Noise Curriculum
Learning [12.697842097171119]
We present a curriculum learning mechanism that adaptively augments the generated regions, which allows the model to consistently acquire a useful learning signal.
Our experiments show that our approach improves on the MoCo v2 baseline by a large margin on multiple object-level tasks when pre-training on multi-object scene image datasets.
arXiv Detail & Related papers (2021-11-26T18:29:57Z) - Learning Contrastive Representation for Semantic Correspondence [150.29135856909477]
We propose a multi-level contrastive learning approach for semantic matching.
We show that image-level contrastive learning is a key component to encourage the convolutional features to find correspondence between similar objects.
arXiv Detail & Related papers (2021-09-22T18:34:14Z) - Rectifying the Shortcut Learning of Background: Shared Object
Concentration for Few-Shot Image Recognition [101.59989523028264]
Few-Shot image classification aims to utilize pretrained knowledge learned from a large-scale dataset to tackle a series of downstream classification tasks.
We propose COSOC, a novel Few-Shot Learning framework, to automatically figure out foreground objects at both pretraining and evaluation stage.
arXiv Detail & Related papers (2021-07-16T07:46:41Z) - Deep Active Learning for Joint Classification & Segmentation with Weak
Annotator [22.271760669551817]
CNN visualization and interpretation methods, like class-activation maps (CAMs), are typically used to highlight the image regions linked to class predictions.
We propose an active learning framework, which progressively integrates pixel-level annotations during training.
Our results indicate that, by simply using random sample selection, the proposed approach can significantly outperform state-of-the-art CAMs and AL methods.
arXiv Detail & Related papers (2020-10-10T03:25:54Z) - One-Shot Object Detection without Fine-Tuning [62.39210447209698]
We introduce a two-stage model consisting of a first stage Matching-FCOS network and a second stage Structure-Aware Relation Module.
We also propose novel training strategies that effectively improve detection performance.
Our method exceeds the state-of-the-art one-shot performance consistently on multiple datasets.
arXiv Detail & Related papers (2020-05-08T01:59:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.