Dynamic Scale Training for Object Detection
- URL: http://arxiv.org/abs/2004.12432v2
- Date: Sun, 14 Mar 2021 05:22:59 GMT
- Title: Dynamic Scale Training for Object Detection
- Authors: Yukang Chen, Peizhen Zhang, Zeming Li, Yanwei Li, Xiangyu Zhang, Lu
Qi, Jian Sun, and Jiaya Jia
- Abstract summary: We propose a Dynamic Scale Training paradigm (abbreviated as DST) to mitigate scale variation challenge in object detection.
Experimental results demonstrate the efficacy of our proposed DST towards scale variation handling.
It does not introduce inference overhead and could serve as a free lunch for general detection configurations.
- Score: 111.33112051962514
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a Dynamic Scale Training paradigm (abbreviated as DST) to mitigate
scale variation challenge in object detection. Previous strategies like image
pyramid, multi-scale training, and their variants are aiming at preparing
scale-invariant data for model optimization. However, the preparation procedure
is unaware of the following optimization process that restricts their
capability in handling the scale variation. Instead, in our paradigm, we use
feedback information from the optimization process to dynamically guide the
data preparation. The proposed method is surprisingly simple yet obtains
significant gains (2%+ Average Precision on MS COCO dataset), outperforming
previous methods. Experimental results demonstrate the efficacy of our proposed
DST method towards scale variation handling. It could also generalize to
various backbones, benchmarks, and other challenging downstream tasks like
instance segmentation. It does not introduce inference overhead and could serve
as a free lunch for general detection configurations. Besides, it also
facilitates efficient training due to fast convergence. Code and models are
available at github.com/yukang2017/Stitcher.
Related papers
- Adaptive Rentention & Correction for Continual Learning [114.5656325514408]
A common problem in continual learning is the classification layer's bias towards the most recent task.
We name our approach Adaptive Retention & Correction (ARC)
ARC achieves an average performance increase of 2.7% and 2.6% on the CIFAR-100 and Imagenet-R datasets.
arXiv Detail & Related papers (2024-05-23T08:43:09Z) - FREE: Faster and Better Data-Free Meta-Learning [77.90126669914324]
Data-Free Meta-Learning (DFML) aims to extract knowledge from a collection of pre-trained models without requiring the original data.
We introduce the Faster and Better Data-Free Meta-Learning framework, which contains: (i) a meta-generator for rapidly recovering training tasks from pre-trained models; and (ii) a meta-learner for generalizing to new unseen tasks.
arXiv Detail & Related papers (2024-05-02T03:43:19Z) - AdAdaGrad: Adaptive Batch Size Schemes for Adaptive Gradient Methods [17.043034606088234]
We introduce AdAdaGrad's scalar variant AdAdaGradNorm, which increase sizes during training.
We also perform image classification experiments, highlighting the merits of our proposed strategies.
arXiv Detail & Related papers (2024-02-17T07:49:50Z) - Exploring Learning Complexity for Downstream Data Pruning [9.526877053855998]
We propose to treat the learning complexity (LC) as the scoring function for classification and regression tasks.
For the instruction fine-tuning of large language models, our method achieves state-of-the-art performance with stable convergence.
arXiv Detail & Related papers (2024-02-08T02:29:33Z) - CMW-Net: Learning a Class-Aware Sample Weighting Mapping for Robust Deep
Learning [55.733193075728096]
Modern deep neural networks can easily overfit to biased training data containing corrupted labels or class imbalance.
Sample re-weighting methods are popularly used to alleviate this data bias issue.
We propose a meta-model capable of adaptively learning an explicit weighting scheme directly from data.
arXiv Detail & Related papers (2022-02-11T13:49:51Z) - AdamP: Slowing Down the Slowdown for Momentum Optimizers on
Scale-invariant Weights [53.8489656709356]
Normalization techniques are a boon for modern deep learning.
It is often overlooked, however, that the additional introduction of momentum results in a rapid reduction in effective step sizes for scale-invariant weights.
In this paper, we verify that the widely-adopted combination of the two ingredients lead to the premature decay of effective step sizes and sub-optimal model performances.
arXiv Detail & Related papers (2020-06-15T08:35:15Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z) - Investigating Transferability in Pretrained Language Models [8.83046338075119]
We consider a simple ablation technique for determining the impact of each pretrained layer on transfer task performance.
This technique reveals that in BERT, layers with high probing performance on downstream GLUE tasks are neither necessary nor sufficient for high accuracy on those tasks.
arXiv Detail & Related papers (2020-04-30T17:23:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.