Related papers: Evaluating YOLO Architectures: Implications for Real-Time Vehicle Detection in Urban Environments of Bangladesh

Evaluating YOLO Architectures: Implications for Real-Time Vehicle Detection in Urban Environments of Bangladesh

URL: http://arxiv.org/abs/2509.05652v2
Date: Sun, 12 Oct 2025 06:57:28 GMT
Title: Evaluating YOLO Architectures: Implications for Real-Time Vehicle Detection in Urban Environments of Bangladesh
Authors: Ha Meem Hossain, Pritam Nath, Mahitun Nesa Mahi, Imtiaz Uddin, Ishrat Jahan Eiste, Syed Nasibur Rahman Ratul, Md Naim Uddin Mozumdar, Asif Mohammed Saad, MD Tamim Hossain,
Abstract summary: Vehicle detection systems trained on Non-Bangladeshi datasets struggle to accurately identify local vehicle types in Bangladesh's unique road environments.<n>This study evaluates six YOLO model variants on a custom dataset featuring 29 distinct vehicle classes.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Vehicle detection systems trained on Non-Bangladeshi datasets struggle to accurately identify local vehicle types in Bangladesh's unique road environments, creating critical gaps in autonomous driving technology for developing regions. This study evaluates six YOLO model variants on a custom dataset featuring 29 distinct vehicle classes, including region-specific vehicles such as ``Desi Nosimon'', ``Leguna'', ``Battery Rickshaw'', and ``CNG''. The dataset comprises high-resolution images (1920x1080) captured across various Bangladeshi roads using mobile phone cameras and manually annotated using LabelImg with YOLO format bounding boxes. Performance evaluation revealed YOLOv11x as the top performer, achieving 63.7\% mAP@0.5, 43.8\% mAP@0.5:0.95, 61.4\% recall, and 61.6\% F1-score, though requiring 45.8 milliseconds per image for inference. Medium variants (YOLOv8m, YOLOv11m) struck an optimal balance, delivering robust detection performance with mAP@0.5 values of 62.5\% and 61.8\% respectively, while maintaining moderate inference times around 14-15 milliseconds. The study identified significant detection challenges for rare vehicle classes, with Construction Vehicles and Desi Nosimons showing near-zero accuracy due to dataset imbalances and insufficient training samples. Confusion matrices revealed frequent misclassifications between visually similar vehicles, particularly Mini Trucks versus Mini Covered Vans. This research provides a foundation for developing robust object detection systems specifically adapted to Bangladesh traffic conditions, addressing critical needs in autonomous vehicle technology advancement for developing regions where conventional generic-trained models fail to perform adequately.

Related papers

HetroD: A High-Fidelity Drone Dataset and Benchmark for Autonomous Driving in Heterogeneous Traffic [49.31491001465465]
HetroD is a dataset and benchmark for developing autonomous driving systems in heterogeneous environments.<n>HetroD targets the critical challenge of navi- gating real-world heterogeneous traffic dominated by vulner- able road users (VRUs)
arXiv Detail & Related papers (2026-02-03T12:12:47Z)
Finetuning YOLOv9 for Vehicle Detection: Deep Learning for Intelligent Transportation Systems in Dhaka, Bangladesh [0.0]
The government of Bangladesh recognizes the integration of ITS to ensure smart mobility as a vital step towards the development plan "Smart Bangladesh Vision 2041" This paper proposes a fine-tuned object detector, the YOLOv9 model to detect native vehicles trained on a Bangladesh-based dataset. Results show that the model achieved a mean Average Precision (mAP) of 0.934 at the Intersection over Union (IoU) threshold of 0.5.
arXiv Detail & Related papers (2024-09-29T02:33:34Z)
RoboSense: Large-scale Dataset and Benchmark for Egocentric Robot Perception and Navigation in Crowded and Unstructured Environments [62.5830455357187]
We setup an egocentric multi-sensor data collection platform based on 3 main types of sensors (Camera, LiDAR and Fisheye)<n>A large-scale multimodal dataset is constructed, named RoboSense, to facilitate egocentric robot perception.
arXiv Detail & Related papers (2024-08-28T03:17:40Z)
Bangladeshi Native Vehicle Detection in Wild [1.444899524297657]
This paper proposes a native vehicle detection dataset for the most commonly appeared vehicle classes in Bangladesh. 17 distinct vehicle classes have been taken into account, with fully annotated 81542 instances of 17326 images. The experiments show that the BNVD dataset serves as a reliable representation of vehicle distribution.
arXiv Detail & Related papers (2024-05-20T16:23:40Z)
BadODD: Bangladeshi Autonomous Driving Object Detection Dataset [0.0]
We propose a comprehensive dataset for object detection in diverse driving environments across 9 districts in Bangladesh. The dataset, collected exclusively from smartphone cameras, provided a realistic representation of real-world scenarios.
arXiv Detail & Related papers (2024-01-19T12:26:51Z)
Performance Analysis of YOLO-based Architectures for Vehicle Detection from Traffic Images in Bangladesh [0.0]
We find the best-suited YOLO architecture for fast and accurate vehicle detection from traffic images in Bangladesh. Models were trained on a dataset containing 7390 images belonging to 21 types of vehicles. We found the YOLOV5x variant to be the best-suited model, performing better than YOLOv3 and YOLOv5s models respectively by 7 & 4 percent in mAP, and 12 & 8.5 percent in terms of Accuracy.
arXiv Detail & Related papers (2022-12-18T18:53:35Z)
CODA: A Real-World Road Corner Case Dataset for Object Detection in Autonomous Driving [117.87070488537334]
We introduce a challenging dataset named CODA that exposes this critical problem of vision-based detectors. The performance of standard object detectors trained on large-scale autonomous driving datasets significantly drops to no more than 12.8% in mAR. We experiment with the state-of-the-art open-world object detector and find that it also fails to reliably identify the novel objects in CODA.
arXiv Detail & Related papers (2022-03-15T08:32:56Z)
Fine-Grained Vehicle Classification in Urban Traffic Scenes using Deep Learning [0.0]
Fine-grained vehicle classification appears to be a challenging task as compared to vehicle coarse classification. Existing Vehicle Make and Model Recognition (VMMR) systems have been developed on synchronized and controlled traffic conditions. Need for robust VMMR in complex, urban, heterogeneous, and unsynchronized traffic conditions still remain an open research area.
arXiv Detail & Related papers (2021-11-17T21:19:03Z)
SODA10M: Towards Large-Scale Object Detection Benchmark for Autonomous Driving [94.11868795445798]
We release a Large-Scale Object Detection benchmark for Autonomous driving, named as SODA10M, containing 10 million unlabeled images and 20K images labeled with 6 representative object categories. To improve diversity, the images are collected every ten seconds per frame within 32 different cities under different weather conditions, periods and location scenes. We provide extensive experiments and deep analyses of existing supervised state-of-the-art detection models, popular self-supervised and semi-supervised approaches, and some insights about how to develop future models.
arXiv Detail & Related papers (2021-06-21T13:55:57Z)
One Million Scenes for Autonomous Driving: ONCE Dataset [91.94189514073354]
We introduce the ONCE dataset for 3D object detection in the autonomous driving scenario. The data is selected from 144 driving hours, which is 20x longer than the largest 3D autonomous driving dataset available. We reproduce and evaluate a variety of self-supervised and semi-supervised methods on the ONCE dataset.
arXiv Detail & Related papers (2021-06-21T12:28:08Z)
Detecting 32 Pedestrian Attributes for Autonomous Vehicles [103.87351701138554]
In this paper, we address the problem of jointly detecting pedestrians and recognizing 32 pedestrian attributes. We introduce a Multi-Task Learning (MTL) model relying on a composite field framework, which achieves both goals in an efficient way. We show competitive detection and attribute recognition results, as well as a more stable MTL training.
arXiv Detail & Related papers (2020-12-04T15:10:12Z)
VehicleNet: Learning Robust Visual Representation for Vehicle Re-identification [116.1587709521173]
We propose to build a large-scale vehicle dataset (called VehicleNet) by harnessing four public vehicle datasets. We design a simple yet effective two-stage progressive approach to learning more robust visual representation from VehicleNet. We achieve the state-of-art accuracy of 86.07% mAP on the private test set of AICity Challenge.
arXiv Detail & Related papers (2020-04-14T05:06:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.