Revisiting DETR Pre-training for Object Detection
- URL: http://arxiv.org/abs/2308.01300v2
- Date: Fri, 1 Dec 2023 18:25:19 GMT
- Title: Revisiting DETR Pre-training for Object Detection
- Authors: Yan Ma, Weicong Liang, Bohan Chen, Yiduo Hao, Bojian Hou, Xiangyu Yue,
Chao Zhang, Yuhui Yuan
- Abstract summary: We investigate the shortcomings of DETReg in enhancing the performance of robust DETR-based models under full data conditions.
We employ an optimized approach named Simple Self-training which leads to marked enhancements through the combination of an improved box predictor and the Objects$365$ benchmark.
The culmination of these endeavors results in a remarkable AP score of $59.3%$ on the COCO val set, outperforming $mathcalH$-Deformable-DETR + Swin-L without pre-training by $1.4%$.
- Score: 24.372444866927538
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Motivated by the remarkable achievements of DETR-based approaches on COCO
object detection and segmentation benchmarks, recent endeavors have been
directed towards elevating their performance through self-supervised
pre-training of Transformers while preserving a frozen backbone. Noteworthy
advancements in accuracy have been documented in certain studies. Our
investigation delved deeply into a representative approach, DETReg, and its
performance assessment in the context of emerging models like
$\mathcal{H}$-Deformable-DETR. Regrettably, DETReg proves inadequate in
enhancing the performance of robust DETR-based models under full data
conditions. To dissect the underlying causes, we conduct extensive experiments
on COCO and PASCAL VOC probing elements such as the selection of pre-training
datasets and strategies for pre-training target generation. By contrast, we
employ an optimized approach named Simple Self-training which leads to marked
enhancements through the combination of an improved box predictor and the
Objects$365$ benchmark. The culmination of these endeavors results in a
remarkable AP score of $59.3\%$ on the COCO val set, outperforming
$\mathcal{H}$-Deformable-DETR + Swin-L without pre-training by $1.4\%$.
Moreover, a series of synthetic pre-training datasets, generated by merging
contemporary image-to-text(LLaVA) and text-to-image (SDXL) models,
significantly amplifies object detection capabilities.
Related papers
- ACTRESS: Active Retraining for Semi-supervised Visual Grounding [52.08834188447851]
A previous study, RefTeacher, makes the first attempt to tackle this task by adopting the teacher-student framework to provide pseudo confidence supervision and attention-based supervision.
This approach is incompatible with current state-of-the-art visual grounding models, which follow the Transformer-based pipeline.
Our paper proposes the ACTive REtraining approach for Semi-Supervised Visual Grounding, abbreviated as ACTRESS.
arXiv Detail & Related papers (2024-07-03T16:33:31Z) - RDBE: Reasoning Distillation-Based Evaluation Enhances Automatic Essay Scoring [0.0]
Reasoning Distillation-Based Evaluation (RDBE) integrates interpretability to elucidate the rationale behind model scores.
Our experimental results demonstrate the efficacy of RDBE across all scoring rubrics considered in the dataset.
arXiv Detail & Related papers (2024-07-03T05:49:01Z) - Take the Bull by the Horns: Hard Sample-Reweighted Continual Training
Improves LLM Generalization [165.98557106089777]
A key challenge is to enhance the capabilities of large language models (LLMs) amid a looming shortage of high-quality training data.
Our study starts from an empirical strategy for the light continual training of LLMs using their original pre-training data sets.
We then formalize this strategy into a principled framework of Instance-Reweighted Distributionally Robust Optimization.
arXiv Detail & Related papers (2024-02-22T04:10:57Z) - TRIAGE: Characterizing and auditing training data for improved
regression [80.11415390605215]
We introduce TRIAGE, a novel data characterization framework tailored to regression tasks and compatible with a broad class of regressors.
TRIAGE utilizes conformal predictive distributions to provide a model-agnostic scoring method, the TRIAGE score.
We show that TRIAGE's characterization is consistent and highlight its utility to improve performance via data sculpting/filtering, in multiple regression settings.
arXiv Detail & Related papers (2023-10-29T10:31:59Z) - Align-DETR: Improving DETR with Simple IoU-aware BCE loss [32.13866392998818]
We propose a metric, recall of best-regressed samples, to quantitively evaluate the misalignment problem.
The proposed loss, IA-BCE, guides the training of DETR to build a strong correlation between classification score and localization precision.
To overcome the dramatic decrease in sample quality induced by the sparsity of queries, we introduce a prime sample weighting mechanism.
arXiv Detail & Related papers (2023-04-15T10:24:51Z) - ST-CoNAL: Consistency-Based Acquisition Criterion Using Temporal
Self-Ensemble for Active Learning [7.94190631530826]
Active learning (AL) is becoming increasingly important to maximize the efficiency of the training process.
We present an AL algorithm, namely student-teacher consistency-based AL (ST-CoNAL)
Experiments conducted for image classification tasks on CIFAR-10, CIFAR-100, Caltech-256, and Tiny ImageNet datasets demonstrate that the proposed STCoNAL significantly better performance than the existing acquisition methods.
arXiv Detail & Related papers (2022-07-05T17:25:59Z) - Recurrent Glimpse-based Decoder for Detection with Transformer [85.64521612986456]
We introduce a novel REcurrent Glimpse-based decOder (REGO) in this paper.
In particular, the REGO employs a multi-stage recurrent processing structure to help the attention of DETR gradually focus on foreground objects.
REGO consistently boosts the performance of different DETR detectors by up to 7% relative gain at the same setting of 50 training epochs.
arXiv Detail & Related papers (2021-12-09T00:29:19Z) - Enhancing the Generalization for Intent Classification and Out-of-Domain
Detection in SLU [70.44344060176952]
Intent classification is a major task in spoken language understanding (SLU)
Recent works have shown that using extra data and labels can improve the OOD detection performance.
This paper proposes to train a model with only IND data while supporting both IND intent classification and OOD detection.
arXiv Detail & Related papers (2021-06-28T08:27:38Z) - Cheaper Pre-training Lunch: An Efficient Paradigm for Object Detection [86.0580214485104]
We propose a general and efficient pre-training paradigm, Montage pre-training, for object detection.
Montage pre-training needs only the target detection dataset while taking only 1/4 computational resources compared to the widely adopted ImageNet pre-training.
The efficiency and effectiveness of Montage pre-training are validated by extensive experiments on the MS-COCO dataset.
arXiv Detail & Related papers (2020-04-25T16:09:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.