Related papers: Optimizing YOLO Architectures for Optimal Road Damage Detection and Classification: A Comparative Study from YOLOv7 to YOLOv10

Optimizing YOLO Architectures for Optimal Road Damage Detection and Classification: A Comparative Study from YOLOv7 to YOLOv10

URL: http://arxiv.org/abs/2410.08409v1
Date: Thu, 10 Oct 2024 22:55:12 GMT
Title: Optimizing YOLO Architectures for Optimal Road Damage Detection and Classification: A Comparative Study from YOLOv7 to YOLOv10
Authors: Vung Pham, Lan Dong Thi Ngoc, Duy-Linh Bui,
Abstract summary: This paper presents a comprehensive workflow for road damage detection using deep learning models. To accommodate hardware limitations, large images are cropped, and lightweight models are utilized. The proposed approach employs multiple model architectures, including a custom YOLOv7 model with Coordinate Attention layers and a Tiny YOLOv7 model.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Maintaining roadway infrastructure is essential for ensuring a safe, efficient, and sustainable transportation system. However, manual data collection for detecting road damage is time-consuming, labor-intensive, and poses safety risks. Recent advancements in artificial intelligence, particularly deep learning, offer a promising solution for automating this process using road images. This paper presents a comprehensive workflow for road damage detection using deep learning models, focusing on optimizations for inference speed while preserving detection accuracy. Specifically, to accommodate hardware limitations, large images are cropped, and lightweight models are utilized. Additionally, an external pothole dataset is incorporated to enhance the detection of this underrepresented damage class. The proposed approach employs multiple model architectures, including a custom YOLOv7 model with Coordinate Attention layers and a Tiny YOLOv7 model, which are trained and combined to maximize detection performance. The models are further reparameterized to optimize inference efficiency. Experimental results demonstrate that the ensemble of the custom YOLOv7 model with three Coordinate Attention layers and the default Tiny YOLOv7 model achieves an F1 score of 0.7027 with an inference speed of 0.0547 seconds per image. The complete pipeline, including data preprocessing, model training, and inference scripts, is publicly available on the project's GitHub repository, enabling reproducibility and facilitating further research.

Related papers

YOLO for Knowledge Extraction from Vehicle Images: A Baseline Study [0.20482269513546458]
This study evaluates the effectiveness of three state-of-the-art deep learning approaches YOLO-v11, YOLO-World, and YOLO- Classification.<n>This dataset was collected under challenging and unconstrained conditions by NSW Police Highway Patrol Vehicles.<n>It was concluded that there is a need to use MVI to get usable models within such complex real-world datasets.
arXiv Detail & Related papers (2025-07-25T05:31:21Z)
BridgeVLA: Input-Output Alignment for Efficient 3D Manipulation Learning with Vision-Language Models [48.81848689570674]
BridgeVLA is a novel 3D VLA model that projects 3D inputs to multiple 2D images, ensuring input alignment with the VLM backbone.<n>It utilizes 2D heatmaps for action prediction, unifying the input and output spaces within a consistent 2D image space.<n>It is able to achieve a success rate of 96.8% on 10+ tasks with only 3 trajectories per task, highlighting its extraordinary sample efficiency.
arXiv Detail & Related papers (2025-06-09T17:36:34Z)
YOLO-ELA: Efficient Local Attention Modeling for High-Performance Real-Time Insulator Defect Detection [0.0]
Existing detection methods for insulator defect identification from unmanned aerial vehicles struggle with complex background scenes and small objects. This paper proposes a new attention-based foundation architecture, YOLO-ELA, to address this issue. Experimental results on high-resolution UAV images show that our method achieved a state-of-the-art performance of 96.9% mAP0.5 and a real-time detection speed of 74.63 frames per second.
arXiv Detail & Related papers (2024-10-15T16:00:01Z)
YOLO9tr: A Lightweight Model for Pavement Damage Detection Utilizing a Generalized Efficient Layer Aggregation Network and Attention Mechanism [0.0]
This paper proposes YOLO9tr, a novel lightweight object detection model for pavement damage detection. YOLO9tr is based on the YOLOv9 architecture, incorporating a partial attention block that enhances feature extraction and attention mechanisms. The model achieves a high frame rate of up to 136 FPS, making it suitable for real-time applications such as video surveillance and automated inspection systems.
arXiv Detail & Related papers (2024-06-17T06:31:43Z)
Real-Time Object Detection in Occluded Environment with Background Cluttering Effects Using Deep Learning [0.8192907805418583]
We concentrate on deep learning models for real-time detection of cars and tanks in an occluded environment with a cluttered background. The developed method makes the custom dataset and employs a preprocessing technique to clean the noisy dataset. The accuracy and frame per second of the SSD-Mobilenet v2 model are higher than YOLO V3 and YOLO V4.
arXiv Detail & Related papers (2024-01-02T01:30:03Z)
Investigating YOLO Models Towards Outdoor Obstacle Detection For Visually Impaired People [3.4628430044380973]
Seven different YOLO object detection models were implemented. YOLOv8 was found to be the best model, which reached a precision of $80%$ and a recall of $68.2%$ on a well-known Obstacle dataset. YOLO-NAS was found to be suboptimal for the obstacle detection task.
arXiv Detail & Related papers (2023-12-10T13:16:22Z)
Exploring the Effectiveness of Dataset Synthesis: An application of Apple Detection in Orchards [68.95806641664713]
We explore the usability of Stable Diffusion 2.1-base for generating synthetic datasets of apple trees for object detection. We train a YOLOv5m object detection model to predict apples in a real-world apple detection dataset. Results demonstrate that the model trained on generated data is slightly underperforming compared to a baseline model trained on real-world images.
arXiv Detail & Related papers (2023-06-20T09:46:01Z)
Performance Analysis of YOLO-based Architectures for Vehicle Detection from Traffic Images in Bangladesh [0.0]
We find the best-suited YOLO architecture for fast and accurate vehicle detection from traffic images in Bangladesh. Models were trained on a dataset containing 7390 images belonging to 21 types of vehicles. We found the YOLOV5x variant to be the best-suited model, performing better than YOLOv3 and YOLOv5s models respectively by 7 & 4 percent in mAP, and 12 & 8.5 percent in terms of Accuracy.
arXiv Detail & Related papers (2022-12-18T18:53:35Z)
CARLA-GeAR: a Dataset Generator for a Systematic Evaluation of Adversarial Robustness of Vision Models [61.68061613161187]
This paper presents CARLA-GeAR, a tool for the automatic generation of synthetic datasets for evaluating the robustness of neural models against physical adversarial patches. The tool is built on the CARLA simulator, using its Python API, and allows the generation of datasets for several vision tasks in the context of autonomous driving. The paper presents an experimental study to evaluate the performance of some defense methods against such attacks, showing how the datasets generated with CARLA-GeAR might be used in future work as a benchmark for adversarial defense in the real world.
arXiv Detail & Related papers (2022-06-09T09:17:38Z)
A lightweight and accurate YOLO-like network for small target detection in Aerial Imagery [94.78943497436492]
We present YOLO-S, a simple, fast and efficient network for small target detection. YOLO-S exploits a small feature extractor based on Darknet20, as well as skip connection, via both bypass and concatenation. YOLO-S has an 87% decrease of parameter size and almost one half FLOPs of YOLOv3, making practical the deployment for low-power industrial applications.
arXiv Detail & Related papers (2022-04-05T16:29:49Z)
To be Critical: Self-Calibrated Weakly Supervised Learning for Salient Object Detection [95.21700830273221]
Weakly-supervised salient object detection (WSOD) aims to develop saliency models using image-level annotations. We propose a self-calibrated training strategy by explicitly establishing a mutual calibration loop between pseudo labels and network predictions. We prove that even a much smaller dataset with well-matched annotations can facilitate models to achieve better performance as well as generalizability.
arXiv Detail & Related papers (2021-09-04T02:45:22Z)
Effective Model Sparsification by Scheduled Grow-and-Prune Methods [73.03533268740605]
We propose a novel scheduled grow-and-prune (GaP) methodology without pre-training the dense models. Experiments have shown that such models can match or beat the quality of highly optimized dense models at 80% sparsity on a variety of tasks.
arXiv Detail & Related papers (2021-06-18T01:03:13Z)
Learnable Online Graph Representations for 3D Multi-Object Tracking [156.58876381318402]
We propose a unified and learning based approach to the 3D MOT problem. We employ a Neural Message Passing network for data association that is fully trainable. We show the merit of the proposed approach on the publicly available nuScenes dataset by achieving state-of-the-art performance of 65.6% AMOTA and 58% fewer ID-switches.
arXiv Detail & Related papers (2021-04-23T17:59:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.