Related papers: DEYO: DETR with YOLO for End-to-End Object Detection

DEYO: DETR with YOLO for End-to-End Object Detection

URL: http://arxiv.org/abs/2402.16370v1
Date: Mon, 26 Feb 2024 07:48:19 GMT
Title: DEYO: DETR with YOLO for End-to-End Object Detection
Authors: Haodong Ouyang
Abstract summary: We introduce the first real-time end-to-end object detection model that utilizes a purely convolutional structure encoder, DETR with YOLO (DEYO) In the first stage of training, we employ a classic detector, pre-trained with a one-to-many matching strategy, to initialize the backbone and neck of the end-to-end detector. In the second stage of training, we froze the backbone and neck of the end-to-end detector, necessitating the training of the decoder from scratch.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The training paradigm of DETRs is heavily contingent upon pre-training their backbone on the ImageNet dataset. However, the limited supervisory signals provided by the image classification task and one-to-one matching strategy result in an inadequately pre-trained neck for DETRs. Additionally, the instability of matching in the early stages of training engenders inconsistencies in the optimization objectives of DETRs. To address these issues, we have devised an innovative training methodology termed step-by-step training. Specifically, in the first stage of training, we employ a classic detector, pre-trained with a one-to-many matching strategy, to initialize the backbone and neck of the end-to-end detector. In the second stage of training, we froze the backbone and neck of the end-to-end detector, necessitating the training of the decoder from scratch. Through the application of step-by-step training, we have introduced the first real-time end-to-end object detection model that utilizes a purely convolutional structure encoder, DETR with YOLO (DEYO). Without reliance on any supplementary training data, DEYO surpasses all existing real-time object detectors in both speed and accuracy. Moreover, the comprehensive DEYO series can complete its second-phase training on the COCO dataset using a single 8GB RTX 4060 GPU, significantly reducing the training expenditure. Source code and pre-trained models are available at https://github.com/ouyanghaodong/DEYO.

Related papers

Effective Data Pruning through Score Extrapolation [40.61665742457229]
We introduce a novel importance score extrapolation framework that requires training on only a small subset of data.<n>We present two initial approaches in this framework to accurately predict sample importance for the entire dataset using patterns learned from this minimal subset.<n>Our results indicate that score extrapolation is a promising direction to scale expensive score calculation methods, such as pruning, data attribution, or other tasks.
arXiv Detail & Related papers (2025-06-10T17:38:49Z)
Truncated Consistency Models [57.50243901368328]
Training consistency models requires learning to map all intermediate points along PF ODE trajectories to their corresponding endpoints. We empirically find that this training paradigm limits the one-step generation performance of consistency models. We propose a new parameterization of the consistency function and a two-stage training procedure that prevents the truncated-time training from collapsing to a trivial solution.
arXiv Detail & Related papers (2024-10-18T22:38:08Z)
DEYOv3: DETR with YOLO for Real-time Object Detection [0.0]
We propose a new training method called step-by-step training. In the first stage, the one-to-many pre-trained YOLO detector is used to initialize the end-to-end detector. In the second stage, the backbone and encoder are consistent with the DETR-like model, but only the detector needs to be trained from scratch.
arXiv Detail & Related papers (2023-09-21T07:49:07Z)
Revisiting DETR Pre-training for Object Detection [24.372444866927538]
We investigate the shortcomings of DETReg in enhancing the performance of robust DETR-based models under full data conditions. We employ an optimized approach named Simple Self-training which leads to marked enhancements through the combination of an improved box predictor and the Objects$365$ benchmark. The culmination of these endeavors results in a remarkable AP score of $59.3%$ on the COCO val set, outperforming $mathcalH$-Deformable-DETR + Swin-L without pre-training by $1.4%$.
arXiv Detail & Related papers (2023-08-02T17:39:30Z)
AlignDet: Aligning Pre-training and Fine-tuning in Object Detection [38.256555424079664]
AlignDet is a unified pre-training framework that can be adapted to various existing detectors to alleviate the discrepancies. It can achieve significant improvements across diverse protocols, such as detection algorithm, model backbone, data setting, and training schedule.
arXiv Detail & Related papers (2023-07-20T17:55:14Z)
Focusing on what to decode and what to train: Efficient Training with HOI Split Decoders and Specific Target Guided DeNoising [17.268302302974607]
Recent one-stage transformer-based methods achieve notable gains in the Human-object Interaction Detection (HOI) task by leveraging the detection of DETR. We propose a novel one-stage framework (SOV) which consists of a subject decoder, an object decoder, and a verb decoder. We propose a novel Specific Target Guided (STG) DeNoising training strategy, which leverages learnable object and verb label embeddings to guide the training and accelerate the training convergence.
arXiv Detail & Related papers (2023-07-05T13:42:31Z)
Architecture, Dataset and Model-Scale Agnostic Data-free Meta-Learning [119.70303730341938]
We propose ePisode cUrriculum inveRsion (ECI) during data-free meta training and invErsion calibRation following inner loop (ICFIL) during meta testing. ECI adaptively increases the difficulty level of pseudo episodes according to the real-time feedback of the meta model. We formulate the optimization process of meta training with ECI as an adversarial form in an end-to-end manner.
arXiv Detail & Related papers (2023-03-20T15:10:41Z)
Intersection of Parallels as an Early Stopping Criterion [64.8387564654474]
We propose a method to spot an early stopping point in the training iterations without the need for a validation set. For a wide range of learning rates, our method, called Cosine-Distance Criterion (CDC), leads to better generalization on average than all the methods that we compare against.
arXiv Detail & Related papers (2022-08-19T19:42:41Z)
Effective and Efficient Training for Sequential Recommendation using Recency Sampling [91.02268704681124]
We propose a novel Recency-based Sampling of Sequences training objective. We show that the models enhanced with our method can achieve performances exceeding or very close to stateof-the-art BERT4Rec.
arXiv Detail & Related papers (2022-07-06T13:06:31Z)
Activation to Saliency: Forming High-Quality Labels for Unsupervised Salient Object Detection [54.92703325989853]
We propose a two-stage Activation-to-Saliency (A2S) framework that effectively generates high-quality saliency cues. No human annotations are involved in our framework during the whole training process. Our framework reports significant performance compared with existing USOD methods.
arXiv Detail & Related papers (2021-12-07T11:54:06Z)
DETReg: Unsupervised Pretraining with Region Priors for Object Detection [103.93533951746612]
DETReg is a new self-supervised method that pretrains the entire object detection network. During pretraining, DETReg predicts object localizations to match the localizations from an unsupervised region proposal generator. It simultaneously aligns the corresponding feature embeddings with embeddings from a self-supervised image encoder.
arXiv Detail & Related papers (2021-06-08T17:39:14Z)
UP-DETR: Unsupervised Pre-training for Object Detection with Transformers [11.251593386108189]
We propose a novel pretext task named random query patch detection in Unsupervised Pre-training DETR (UP-DETR) Specifically, we randomly crop patches from the given image and then feed them as queries to the decoder. UP-DETR significantly boosts the performance of DETR with faster convergence and higher average precision on object detection, one-shot detection and panoptic segmentation.
arXiv Detail & Related papers (2020-11-18T05:16:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.