An Effective Two-stage Training Paradigm Detector for Small Dataset
- URL: http://arxiv.org/abs/2309.05652v1
- Date: Mon, 11 Sep 2023 17:43:11 GMT
- Title: An Effective Two-stage Training Paradigm Detector for Small Dataset
- Authors: Zheng Wang, Dong Xie, Hanzhi Wang, Jiang Tian
- Abstract summary: The backbone of YOLOv8 is pre-trained as the encoder using the masked image modeling technique.
During the test stage, test-time augmentation (TTA) is used to enhance each model, and weighted box fusion (WBF) is implemented to further boost the performance.
With the well-designed structure, our approach has achieved 30.4% average precision from 0.50 to 0.95 on the DelftBikes test set, ranking 4th on the leaderboard.
- Score: 13.227589864946477
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Learning from the limited amount of labeled data to the pre-train model has
always been viewed as a challenging task. In this report, an effective and
robust solution, the two-stage training paradigm YOLOv8 detector (TP-YOLOv8),
is designed for the object detection track in VIPriors Challenge 2023. First,
the backbone of YOLOv8 is pre-trained as the encoder using the masked image
modeling technique. Then the detector is fine-tuned with elaborate
augmentations. During the test stage, test-time augmentation (TTA) is used to
enhance each model, and weighted box fusion (WBF) is implemented to further
boost the performance. With the well-designed structure, our approach has
achieved 30.4% average precision from 0.50 to 0.95 on the DelftBikes test set,
ranking 4th on the leaderboard.
Related papers
- Deep Learning Models for UAV-Assisted Bridge Inspection: A YOLO Benchmark Analysis [0.41942958779358674]
We benchmark 23 models belonging to the four newest YOLO variants (YOLOv5, YOLOv6, YOLOv7, YOLOv8)
We identify YOLOv8n, YOLOv7tiny, YOLOv6m, and YOLOv6m as the models offering an optimal balance between accuracy and processing speed.
Our findings accelerate the model selection process for UAVs, enabling more efficient and reliable bridge inspections.
arXiv Detail & Related papers (2024-11-07T07:03:40Z) - Optimizing YOLO Architectures for Optimal Road Damage Detection and Classification: A Comparative Study from YOLOv7 to YOLOv10 [0.0]
This paper presents a comprehensive workflow for road damage detection using deep learning models.
To accommodate hardware limitations, large images are cropped, and lightweight models are utilized.
The proposed approach employs multiple model architectures, including a custom YOLOv7 model with Coordinate Attention layers and a Tiny YOLOv7 model.
arXiv Detail & Related papers (2024-10-10T22:55:12Z) - Self-Updating Vehicle Monitoring Framework Employing Distributed Acoustic Sensing towards Real-World Settings [5.306938463648908]
We introduce a real-time semi-supervised vehicle monitoring framework tailored to urban settings.
It requires only a small fraction of manual labels for initial training and exploits unlabeled data for model improvement.
We propose a novel prior loss that incorporates the shapes of vehicular traces to track a single vehicle with varying speeds.
arXiv Detail & Related papers (2024-09-16T13:10:58Z) - DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning [61.10299147201369]
This paper introduces a novel autonomous RL approach, called DigiRL, for training in-the-wild device control agents.
We build a scalable and parallelizable Android learning environment equipped with a VLM-based evaluator.
We demonstrate the effectiveness of DigiRL using the Android-in-the-Wild dataset, where our 1.3B VLM trained with RL achieves a 49.5% absolute improvement.
arXiv Detail & Related papers (2024-06-14T17:49:55Z) - An Experimental Study on Exploring Strong Lightweight Vision Transformers via Masked Image Modeling Pre-Training [51.622652121580394]
Masked image modeling (MIM) pre-training for large-scale vision transformers (ViTs) has enabled promising downstream performance on top of the learned self-supervised ViT features.
In this paper, we question if the textitextremely simple lightweight ViTs' fine-tuning performance can also benefit from this pre-training paradigm.
Our pre-training with distillation on pure lightweight ViTs with vanilla/hierarchical design ($5.7M$/$6.5M$) can achieve $79.4%$/$78.9%$ top-1 accuracy on ImageNet-1
arXiv Detail & Related papers (2024-04-18T14:14:44Z) - Spanning Training Progress: Temporal Dual-Depth Scoring (TDDS) for Enhanced Dataset Pruning [50.809769498312434]
We propose a novel dataset pruning method termed as Temporal Dual-Depth Scoring (TDDS)
Our method achieves 54.51% accuracy with only 10% training data, surpassing random selection by 7.83% and other comparison methods by at least 12.69%.
arXiv Detail & Related papers (2023-11-22T03:45:30Z) - DEYOv3: DETR with YOLO for Real-time Object Detection [0.0]
We propose a new training method called step-by-step training.
In the first stage, the one-to-many pre-trained YOLO detector is used to initialize the end-to-end detector.
In the second stage, the backbone and encoder are consistent with the DETR-like model, but only the detector needs to be trained from scratch.
arXiv Detail & Related papers (2023-09-21T07:49:07Z) - Q-YOLOP: Quantization-aware You Only Look Once for Panoptic Driving
Perception [6.3709120604927945]
We present an efficient and quantization-aware panoptic driving perception model (Q- YOLOP) for object detection, drivable area segmentation, and lane line segmentation.
The proposed model achieves state-of-the-art performance with an mAP@0.5 of 0.622 for object detection and an mIoU of 0.612 for segmentation.
arXiv Detail & Related papers (2023-07-10T13:02:46Z) - To be Critical: Self-Calibrated Weakly Supervised Learning for Salient
Object Detection [95.21700830273221]
Weakly-supervised salient object detection (WSOD) aims to develop saliency models using image-level annotations.
We propose a self-calibrated training strategy by explicitly establishing a mutual calibration loop between pseudo labels and network predictions.
We prove that even a much smaller dataset with well-matched annotations can facilitate models to achieve better performance as well as generalizability.
arXiv Detail & Related papers (2021-09-04T02:45:22Z) - Workshop on Autonomous Driving at CVPR 2021: Technical Report for
Streaming Perception Challenge [57.647371468876116]
We introduce our real-time 2D object detection system for the realistic autonomous driving scenario.
Our detector is built on a newly designed YOLO model, called YOLOX.
On the Argoverse-HD dataset, our system achieves 41.0 streaming AP, which surpassed second place by 7.8/6.1 on detection-only track/fully track, respectively.
arXiv Detail & Related papers (2021-07-27T06:36:06Z) - End-to-End Semi-Supervised Object Detection with Soft Teacher [63.26266730447914]
This paper presents an end-to-end semi-supervised object detection approach, in contrast to previous more complex multi-stage methods.
The proposed approach outperforms previous methods by a large margin under various labeling ratios.
On the state-of-the-art Swin Transformer-based object detector, it can still significantly improve the detection accuracy by +1.5 mAP.
arXiv Detail & Related papers (2021-06-16T17:59:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.