Learning from Rich Semantics and Coarse Locations for Long-tailed Object
Detection
- URL: http://arxiv.org/abs/2310.12152v1
- Date: Wed, 18 Oct 2023 17:59:41 GMT
- Title: Learning from Rich Semantics and Coarse Locations for Long-tailed Object
Detection
- Authors: Lingchen Meng, Xiyang Dai, Jianwei Yang, Dongdong Chen, Yinpeng Chen,
Mengchen Liu, Yi-Ling Chen, Zuxuan Wu, Lu Yuan, Yu-Gang Jiang
- Abstract summary: RichSem is a robust method to learn rich semantics from coarse locations without the need of accurate bounding boxes.
We add a semantic branch to our detector to learn these soft semantics and enhance feature representations for long-tailed object detection.
Our method achieves state-of-the-art performance without requiring complex training and testing procedures.
- Score: 157.18560601328534
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Long-tailed object detection (LTOD) aims to handle the extreme data imbalance
in real-world datasets, where many tail classes have scarce instances. One
popular strategy is to explore extra data with image-level labels, yet it
produces limited results due to (1) semantic ambiguity -- an image-level label
only captures a salient part of the image, ignoring the remaining rich
semantics within the image; and (2) location sensitivity -- the label highly
depends on the locations and crops of the original image, which may change
after data transformations like random cropping. To remedy this, we propose
RichSem, a simple but effective method, which is robust to learn rich semantics
from coarse locations without the need of accurate bounding boxes. RichSem
leverages rich semantics from images, which are then served as additional soft
supervision for training detectors. Specifically, we add a semantic branch to
our detector to learn these soft semantics and enhance feature representations
for long-tailed object detection. The semantic branch is only used for training
and is removed during inference. RichSem achieves consistent improvements on
both overall and rare-category of LVIS under different backbones and detectors.
Our method achieves state-of-the-art performance without requiring complex
training and testing procedures. Moreover, we show the effectiveness of our
method on other long-tailed datasets with additional experiments. Code is
available at \url{https://github.com/MengLcool/RichSem}.
Related papers
- Task Specific Pretraining with Noisy Labels for Remote Sensing Image Segmentation [18.598405597933752]
Self-supervision provides remote sensing a tool to reduce the amount of exact, human-crafted geospatial annotations.
In this work, we propose to exploit noisy semantic segmentation maps for model pretraining.
The results from two datasets indicate the effectiveness of task-specific supervised pretraining with noisy labels.
arXiv Detail & Related papers (2024-02-25T18:01:42Z) - Learning Semantic Segmentation with Query Points Supervision on Aerial Images [57.09251327650334]
We present a weakly supervised learning algorithm to train semantic segmentation algorithms.
Our proposed approach performs accurate semantic segmentation and improves efficiency by significantly reducing the cost and time required for manual annotation.
arXiv Detail & Related papers (2023-09-11T14:32:04Z) - A Contrastive Distillation Approach for Incremental Semantic
Segmentation in Aerial Images [15.75291664088815]
A major issue concerning current deep neural architectures is known as catastrophic forgetting.
We propose a contrastive regularization, where any given input is compared with its augmented version.
We show the effectiveness of our solution on the Potsdam dataset, outperforming the incremental baseline in every test.
arXiv Detail & Related papers (2021-12-07T16:44:45Z) - Region-level Active Learning for Cluttered Scenes [60.93811392293329]
We introduce a new strategy that subsumes previous Image-level and Object-level approaches into a generalized, Region-level approach.
We show that this approach significantly decreases labeling effort and improves rare object search on realistic data with inherent class-imbalance and cluttered scenes.
arXiv Detail & Related papers (2021-08-20T14:02:38Z) - AugNet: End-to-End Unsupervised Visual Representation Learning with
Image Augmentation [3.6790362352712873]
We propose AugNet, a new deep learning training paradigm to learn image features from a collection of unlabeled pictures.
Our experiments demonstrate that the method is able to represent the image in low dimensional space.
Unlike many deep-learning-based image retrieval algorithms, our approach does not require access to external annotated datasets.
arXiv Detail & Related papers (2021-06-11T09:02:30Z) - Image-Level or Object-Level? A Tale of Two Resampling Strategies for
Long-Tailed Detection [114.00301664929911]
We show that long-tailed detection differs from classification since multiple classes may be present in one image.
We introduce an object-centric memory replay strategy based on dynamic, episodic memory banks.
Our method outperforms state-of-the-art long-tailed detection and segmentation methods on LVIS v0.5 across various backbones.
arXiv Detail & Related papers (2021-04-12T17:58:30Z) - Grafit: Learning fine-grained image representations with coarse labels [114.17782143848315]
This paper tackles the problem of learning a finer representation than the one provided by training labels.
By jointly leveraging the coarse labels and the underlying fine-grained latent space, it significantly improves the accuracy of category-level retrieval methods.
arXiv Detail & Related papers (2020-11-25T19:06:26Z) - Mining Cross-Image Semantics for Weakly Supervised Semantic Segmentation [128.03739769844736]
Two neural co-attentions are incorporated into the classifier to capture cross-image semantic similarities and differences.
In addition to boosting object pattern learning, the co-attention can leverage context from other related images to improve localization map inference.
Our algorithm sets new state-of-the-arts on all these settings, demonstrating well its efficacy and generalizability.
arXiv Detail & Related papers (2020-07-03T21:53:46Z) - Continual Local Replacement for Few-shot Learning [13.956960291580938]
The goal of few-shot learning is to learn a model that can recognize novel classes based on one or few training data.
It is challenging mainly due to two aspects: (1) it lacks good feature representation of novel classes; (2) a few of labeled data could not accurately represent the true data distribution.
A novel continual local replacement strategy is proposed to address the data deficiency problem.
arXiv Detail & Related papers (2020-01-23T04:26:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.