Towards Better Object Detection in Scale Variation with Adaptive Feature
Selection
- URL: http://arxiv.org/abs/2012.03265v2
- Date: Wed, 9 Dec 2020 13:43:09 GMT
- Title: Towards Better Object Detection in Scale Variation with Adaptive Feature
Selection
- Authors: Zehui Gong, Dong Li
- Abstract summary: We propose a novel adaptive feature selection module (AFSM) to automatically learn the way to fuse multi-level representations in the channel dimension.
It significantly improves the performance of the detectors that have a feature pyramid structure.
A class-aware sampling mechanism (CASM) is proposed to tackle the class imbalance problem.
- Score: 3.5352273012717044
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: It is a common practice to exploit pyramidal feature representation to tackle
the problem of scale variation in object instances. However, most of them still
predict the objects in a certain range of scales based solely or mainly on a
single-level representation, yielding inferior detection performance. To this
end, we propose a novel adaptive feature selection module (AFSM), to
automatically learn the way to fuse multi-level representations in the channel
dimension, in a data-driven manner. It significantly improves the performance
of the detectors that have a feature pyramid structure, while introducing
nearly free inference overhead. Moreover, a class-aware sampling mechanism
(CASM) is proposed to tackle the class imbalance problem, by re-weighting the
sampling ratio to each of the training images, based on the statistical
characteristics of each class. This is crucial to improve the performance of
the minor classes. Experimental results demonstrate the effectiveness of the
proposed method, with 83.04% mAP at 15.96 FPS on the VOC dataset, and 39.48% AP
on the VisDrone-DET validation subset, respectively, outperforming other
state-of-the-art detectors considerably. The code is available at
https://github.com/ZeHuiGong/AFSM.git.
Related papers
- Rethinking Pre-trained Feature Extractor Selection in Multiple Instance Learning for Whole Slide Image Classification [2.6703221234079946]
Multiple instance learning (MIL) has become a preferred method for gigapixel whole slide image (WSI) classification without requiring patch-level annotations.
This study systematically evaluating MIL feature extractors across three dimensions: pre-training dataset, backbone model, and pre-training method.
Our findings reveal that selecting a robust self-supervised learning (SSL) method has a greater impact on performance than relying solely on an in-domain pre-training dataset.
arXiv Detail & Related papers (2024-08-02T10:34:23Z) - Scale-Invariant Feature Disentanglement via Adversarial Learning for UAV-based Object Detection [18.11107031800982]
We propose to improve single-stage inference accuracy through learning scale-invariant features.
We apply our approach to three state-of-the-art lightweight detection frameworks on three benchmark datasets.
arXiv Detail & Related papers (2024-05-24T11:40:22Z) - Mean-AP Guided Reinforced Active Learning for Object Detection [31.304039641225504]
This paper introduces Mean-AP Guided Reinforced Active Learning for Object Detection (MGRAL)
MGRAL is a novel approach that leverages the concept of expected model output changes as informativeness for deep detection networks.
Our approach demonstrates strong performance, establishing a new paradigm in reinforcement learning-based active learning for object detection.
arXiv Detail & Related papers (2023-10-12T14:59:22Z) - No Fear of Heterogeneity: Classifier Calibration for Federated Learning
with Non-IID Data [78.69828864672978]
A central challenge in training classification models in the real-world federated system is learning with non-IID data.
We propose a novel and simple algorithm called Virtual Representations (CCVR), which adjusts the classifier using virtual representations sampled from an approximated ssian mixture model.
Experimental results demonstrate that CCVR state-of-the-art performance on popular federated learning benchmarks including CIFAR-10, CIFAR-100, and CINIC-10.
arXiv Detail & Related papers (2021-06-09T12:02:29Z) - Adversarial Feature Augmentation and Normalization for Visual
Recognition [109.6834687220478]
Recent advances in computer vision take advantage of adversarial data augmentation to ameliorate the generalization ability of classification models.
Here, we present an effective and efficient alternative that advocates adversarial augmentation on intermediate feature embeddings.
We validate the proposed approach across diverse visual recognition tasks with representative backbone networks.
arXiv Detail & Related papers (2021-03-22T20:36:34Z) - Meta-learning One-class Classifiers with Eigenvalue Solvers for
Supervised Anomaly Detection [55.888835686183995]
We propose a neural network-based meta-learning method for supervised anomaly detection.
We experimentally demonstrate that the proposed method achieves better performance than existing anomaly detection and few-shot learning methods.
arXiv Detail & Related papers (2021-03-01T01:43:04Z) - Fine-Grained Dynamic Head for Object Detection [68.70628757217939]
We propose a fine-grained dynamic head to conditionally select a pixel-level combination of FPN features from different scales for each instance.
Experiments demonstrate the effectiveness and efficiency of the proposed method on several state-of-the-art detection benchmarks.
arXiv Detail & Related papers (2020-12-07T08:16:32Z) - Multi-Scale Positive Sample Refinement for Few-Shot Object Detection [61.60255654558682]
Few-shot object detection (FSOD) helps detectors adapt to unseen classes with few training instances.
We propose a Multi-scale Positive Sample Refinement (MPSR) approach to enrich object scales in FSOD.
MPSR generates multi-scale positive samples as object pyramids and refines the prediction at various scales.
arXiv Detail & Related papers (2020-07-18T09:48:29Z) - Multi-scale Interactive Network for Salient Object Detection [91.43066633305662]
We propose the aggregate interaction modules to integrate the features from adjacent levels.
To obtain more efficient multi-scale features, the self-interaction modules are embedded in each decoder unit.
Experimental results on five benchmark datasets demonstrate that the proposed method without any post-processing performs favorably against 23 state-of-the-art approaches.
arXiv Detail & Related papers (2020-07-17T15:41:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.