An Effective Two-stage Training Paradigm Detector for Small Dataset
- URL: http://arxiv.org/abs/2309.05652v1
- Date: Mon, 11 Sep 2023 17:43:11 GMT
- Title: An Effective Two-stage Training Paradigm Detector for Small Dataset
- Authors: Zheng Wang, Dong Xie, Hanzhi Wang, Jiang Tian
- Abstract summary: The backbone of YOLOv8 is pre-trained as the encoder using the masked image modeling technique.
During the test stage, test-time augmentation (TTA) is used to enhance each model, and weighted box fusion (WBF) is implemented to further boost the performance.
With the well-designed structure, our approach has achieved 30.4% average precision from 0.50 to 0.95 on the DelftBikes test set, ranking 4th on the leaderboard.
- Score: 13.227589864946477
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Learning from the limited amount of labeled data to the pre-train model has
always been viewed as a challenging task. In this report, an effective and
robust solution, the two-stage training paradigm YOLOv8 detector (TP-YOLOv8),
is designed for the object detection track in VIPriors Challenge 2023. First,
the backbone of YOLOv8 is pre-trained as the encoder using the masked image
modeling technique. Then the detector is fine-tuned with elaborate
augmentations. During the test stage, test-time augmentation (TTA) is used to
enhance each model, and weighted box fusion (WBF) is implemented to further
boost the performance. With the well-designed structure, our approach has
achieved 30.4% average precision from 0.50 to 0.95 on the DelftBikes test set,
ranking 4th on the leaderboard.
Related papers
- DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning [61.10299147201369]
This paper introduces a novel autonomous RL approach, called DigiRL, for training in-the-wild device control agents.
We build a scalable and parallelizable Android learning environment equipped with a VLM-based evaluator.
We demonstrate the effectiveness of DigiRL using the Android-in-the-Wild dataset, where our 1.3B VLM trained with RL achieves a 49.5% absolute improvement.
arXiv Detail & Related papers (2024-06-14T17:49:55Z) - An Experimental Study on Exploring Strong Lightweight Vision Transformers via Masked Image Modeling Pre-Training [51.622652121580394]
Masked image modeling (MIM) pre-training for large-scale vision transformers (ViTs) has enabled promising downstream performance on top of the learned self-supervised ViT features.
In this paper, we question if the textitextremely simple lightweight ViTs' fine-tuning performance can also benefit from this pre-training paradigm.
Our pre-training with distillation on pure lightweight ViTs with vanilla/hierarchical design ($5.7M$/$6.5M$) can achieve $79.4%$/$78.9%$ top-1 accuracy on ImageNet-1
arXiv Detail & Related papers (2024-04-18T14:14:44Z) - Spanning Training Progress: Temporal Dual-Depth Scoring (TDDS) for Enhanced Dataset Pruning [50.809769498312434]
We propose a novel dataset pruning method termed as Temporal Dual-Depth Scoring (TDDS)
Our method achieves 54.51% accuracy with only 10% training data, surpassing random selection by 7.83% and other comparison methods by at least 12.69%.
arXiv Detail & Related papers (2023-11-22T03:45:30Z) - DEYOv3: DETR with YOLO for Real-time Object Detection [0.0]
We propose a new training method called step-by-step training.
In the first stage, the one-to-many pre-trained YOLO detector is used to initialize the end-to-end detector.
In the second stage, the backbone and encoder are consistent with the DETR-like model, but only the detector needs to be trained from scratch.
arXiv Detail & Related papers (2023-09-21T07:49:07Z) - Q-YOLOP: Quantization-aware You Only Look Once for Panoptic Driving
Perception [6.3709120604927945]
We present an efficient and quantization-aware panoptic driving perception model (Q- YOLOP) for object detection, drivable area segmentation, and lane line segmentation.
The proposed model achieves state-of-the-art performance with an mAP@0.5 of 0.622 for object detection and an mIoU of 0.612 for segmentation.
arXiv Detail & Related papers (2023-07-10T13:02:46Z) - Identification of Binary Neutron Star Mergers in Gravitational-Wave Data
Using YOLO One-Shot Object Detection [0.0]
We demonstrate the application of the YOLOv5 model, a general purpose convolution-based single-shot object detection model, in the task of detecting binary neutron star (BNS) coalescence events from gravitational-wave data of current generation interferometer detectors.
We achieve mean average precision ($textmAP_[0.50]$) values of 0.945 for a single class validation dataset and as high as 0.978 for test datasets.
arXiv Detail & Related papers (2022-07-01T10:11:44Z) - To be Critical: Self-Calibrated Weakly Supervised Learning for Salient
Object Detection [95.21700830273221]
Weakly-supervised salient object detection (WSOD) aims to develop saliency models using image-level annotations.
We propose a self-calibrated training strategy by explicitly establishing a mutual calibration loop between pseudo labels and network predictions.
We prove that even a much smaller dataset with well-matched annotations can facilitate models to achieve better performance as well as generalizability.
arXiv Detail & Related papers (2021-09-04T02:45:22Z) - Workshop on Autonomous Driving at CVPR 2021: Technical Report for
Streaming Perception Challenge [57.647371468876116]
We introduce our real-time 2D object detection system for the realistic autonomous driving scenario.
Our detector is built on a newly designed YOLO model, called YOLOX.
On the Argoverse-HD dataset, our system achieves 41.0 streaming AP, which surpassed second place by 7.8/6.1 on detection-only track/fully track, respectively.
arXiv Detail & Related papers (2021-07-27T06:36:06Z) - End-to-End Semi-Supervised Object Detection with Soft Teacher [63.26266730447914]
This paper presents an end-to-end semi-supervised object detection approach, in contrast to previous more complex multi-stage methods.
The proposed approach outperforms previous methods by a large margin under various labeling ratios.
On the state-of-the-art Swin Transformer-based object detector, it can still significantly improve the detection accuracy by +1.5 mAP.
arXiv Detail & Related papers (2021-06-16T17:59:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.