Related papers: An Effective Two-stage Training Paradigm Detector for Small Dataset

An Effective Two-stage Training Paradigm Detector for Small Dataset

URL: http://arxiv.org/abs/2309.05652v1
Date: Mon, 11 Sep 2023 17:43:11 GMT
Title: An Effective Two-stage Training Paradigm Detector for Small Dataset
Authors: Zheng Wang, Dong Xie, Hanzhi Wang, Jiang Tian
Abstract summary: The backbone of YOLOv8 is pre-trained as the encoder using the masked image modeling technique. During the test stage, test-time augmentation (TTA) is used to enhance each model, and weighted box fusion (WBF) is implemented to further boost the performance. With the well-designed structure, our approach has achieved 30.4% average precision from 0.50 to 0.95 on the DelftBikes test set, ranking 4th on the leaderboard.
Score: 13.227589864946477
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Learning from the limited amount of labeled data to the pre-train model has always been viewed as a challenging task. In this report, an effective and robust solution, the two-stage training paradigm YOLOv8 detector (TP-YOLOv8), is designed for the object detection track in VIPriors Challenge 2023. First, the backbone of YOLOv8 is pre-trained as the encoder using the masked image modeling technique. Then the detector is fine-tuned with elaborate augmentations. During the test stage, test-time augmentation (TTA) is used to enhance each model, and weighted box fusion (WBF) is implemented to further boost the performance. With the well-designed structure, our approach has achieved 30.4% average precision from 0.50 to 0.95 on the DelftBikes test set, ranking 4th on the leaderboard.

Related papers

SeisMoLLM: Advancing Seismic Monitoring via Cross-modal Transfer with Pre-trained Large Language Model [69.74609763584449]
This work presents SeisMoLLM, the first foundation model that utilizes cross-modal transfer for seismic monitoring. It achieves state-of-the-art performance on the DiTing and STEAD datasets across five critical tasks. In addition to its superior performance, SeisMoLLM maintains efficiency comparable to or even better than lightweight models in both training and inference.
arXiv Detail & Related papers (2025-02-27T10:35:53Z)
INTACT: Inducing Noise Tolerance through Adversarial Curriculum Training for LiDAR-based Safety-Critical Perception and Autonomy [0.4124847249415279]
We present a novel framework designed to enhance the robustness of deep neural networks (DNNs) against noisy LiDAR data. IntACT combines meta-learning with adversarial curriculum training (ACT) to address challenges posed by data corruption and sparsity in 3D point clouds. IntACT's effectiveness is demonstrated through comprehensive evaluations on object detection, tracking, and classification benchmarks.
arXiv Detail & Related papers (2025-02-04T00:02:16Z)
Deep Learning Models for UAV-Assisted Bridge Inspection: A YOLO Benchmark Analysis [0.41942958779358674]
We benchmark 23 models belonging to the four newest YOLO variants (YOLOv5, YOLOv6, YOLOv7, YOLOv8) We identify YOLOv8n, YOLOv7tiny, YOLOv6m, and YOLOv6m as the models offering an optimal balance between accuracy and processing speed. Our findings accelerate the model selection process for UAVs, enabling more efficient and reliable bridge inspections.
arXiv Detail & Related papers (2024-11-07T07:03:40Z)
Optimizing YOLO Architectures for Optimal Road Damage Detection and Classification: A Comparative Study from YOLOv7 to YOLOv10 [0.0]
This paper presents a comprehensive workflow for road damage detection using deep learning models. To accommodate hardware limitations, large images are cropped, and lightweight models are utilized. The proposed approach employs multiple model architectures, including a custom YOLOv7 model with Coordinate Attention layers and a Tiny YOLOv7 model.
arXiv Detail & Related papers (2024-10-10T22:55:12Z)
Self-Updating Vehicle Monitoring Framework Employing Distributed Acoustic Sensing towards Real-World Settings [5.306938463648908]
We introduce a real-time semi-supervised vehicle monitoring framework tailored to urban settings. It requires only a small fraction of manual labels for initial training and exploits unlabeled data for model improvement. We propose a novel prior loss that incorporates the shapes of vehicular traces to track a single vehicle with varying speeds.
arXiv Detail & Related papers (2024-09-16T13:10:58Z)
DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning [61.10299147201369]
This paper introduces a novel autonomous RL approach, called DigiRL, for training in-the-wild device control agents. We build a scalable and parallelizable Android learning environment equipped with a VLM-based evaluator. We demonstrate the effectiveness of DigiRL using the Android-in-the-Wild dataset, where our 1.3B VLM trained with RL achieves a 49.5% absolute improvement.
arXiv Detail & Related papers (2024-06-14T17:49:55Z)
An Experimental Study on Exploring Strong Lightweight Vision Transformers via Masked Image Modeling Pre-Training [51.622652121580394]
Masked image modeling (MIM) pre-training for large-scale vision transformers (ViTs) has enabled promising downstream performance on top of the learned self-supervised ViT features. In this paper, we question if the textitextremely simple lightweight ViTs' fine-tuning performance can also benefit from this pre-training paradigm. Our pre-training with distillation on pure lightweight ViTs with vanilla/hierarchical design ($5.7M$/$6.5M$) can achieve $79.4%$/$78.9%$ top-1 accuracy on ImageNet-1
arXiv Detail & Related papers (2024-04-18T14:14:44Z)
Spanning Training Progress: Temporal Dual-Depth Scoring (TDDS) for Enhanced Dataset Pruning [50.809769498312434]
We propose a novel dataset pruning method termed as Temporal Dual-Depth Scoring (TDDS) Our method achieves 54.51% accuracy with only 10% training data, surpassing random selection by 7.83% and other comparison methods by at least 12.69%.
arXiv Detail & Related papers (2023-11-22T03:45:30Z)
DEYOv3: DETR with YOLO for Real-time Object Detection [0.0]
We propose a new training method called step-by-step training. In the first stage, the one-to-many pre-trained YOLO detector is used to initialize the end-to-end detector. In the second stage, the backbone and encoder are consistent with the DETR-like model, but only the detector needs to be trained from scratch.
arXiv Detail & Related papers (2023-09-21T07:49:07Z)
Q-YOLOP: Quantization-aware You Only Look Once for Panoptic Driving Perception [6.3709120604927945]
We present an efficient and quantization-aware panoptic driving perception model (Q- YOLOP) for object detection, drivable area segmentation, and lane line segmentation. The proposed model achieves state-of-the-art performance with an mAP@0.5 of 0.622 for object detection and an mIoU of 0.612 for segmentation.
arXiv Detail & Related papers (2023-07-10T13:02:46Z)
To be Critical: Self-Calibrated Weakly Supervised Learning for Salient Object Detection [95.21700830273221]
Weakly-supervised salient object detection (WSOD) aims to develop saliency models using image-level annotations. We propose a self-calibrated training strategy by explicitly establishing a mutual calibration loop between pseudo labels and network predictions. We prove that even a much smaller dataset with well-matched annotations can facilitate models to achieve better performance as well as generalizability.
arXiv Detail & Related papers (2021-09-04T02:45:22Z)
Workshop on Autonomous Driving at CVPR 2021: Technical Report for Streaming Perception Challenge [57.647371468876116]
We introduce our real-time 2D object detection system for the realistic autonomous driving scenario. Our detector is built on a newly designed YOLO model, called YOLOX. On the Argoverse-HD dataset, our system achieves 41.0 streaming AP, which surpassed second place by 7.8/6.1 on detection-only track/fully track, respectively.
arXiv Detail & Related papers (2021-07-27T06:36:06Z)
End-to-End Semi-Supervised Object Detection with Soft Teacher [63.26266730447914]
This paper presents an end-to-end semi-supervised object detection approach, in contrast to previous more complex multi-stage methods. The proposed approach outperforms previous methods by a large margin under various labeling ratios. On the state-of-the-art Swin Transformer-based object detector, it can still significantly improve the detection accuracy by +1.5 mAP.
arXiv Detail & Related papers (2021-06-16T17:59:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.