Related papers: The Urban Vision Hackathon Dataset and Models: Towards Image Annotations and Accurate Vision Models for Indian Traffic

The Urban Vision Hackathon Dataset and Models: Towards Image Annotations and Accurate Vision Models for Indian Traffic

URL: http://arxiv.org/abs/2511.02563v1
Date: Tue, 04 Nov 2025 13:36:03 GMT
Title: The Urban Vision Hackathon Dataset and Models: Towards Image Annotations and Accurate Vision Models for Indian Traffic
Authors: Akash Sharma, Chinmay Mhatre, Sankalp Gawali, Ruthvik Bokkasam, Brij Kishore, Vishwajeet Pattanaik, Tarun Rambha, Abdul R. Pinjari, Vijay Kovvali, Anirban Chakraborty, Punit Rathore, Raghu Krishnapuram, Yogesh Simmhan,
Abstract summary: UVH-26 is the first public release by AIM@IISc of a large-scale dataset of annotated traffic-camera images from India.<n>The dataset comprises 26,646 high-resolution (1080p) images sampled from 2800 Bangalore's Safe-City CCTV cameras over a 4-week period.<n>In total, 1.8 million bounding boxes were labeled across 14 vehicle classes specific to India.
Score: 6.346576275272361
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This report describes the UVH-26 dataset, the first public release by AIM@IISc of a large-scale dataset of annotated traffic-camera images from India. The dataset comprises 26,646 high-resolution (1080p) images sampled from 2800 Bengaluru's Safe-City CCTV cameras over a 4-week period, and subsequently annotated through a crowdsourced hackathon involving 565 college students from across India. In total, 1.8 million bounding boxes were labeled across 14 vehicle classes specific to India: Cycle, 2-Wheeler (Motorcycle), 3-Wheeler (Auto-rickshaw), LCV (Light Commercial Vehicles), Van, Tempo-traveller, Hatchback, Sedan, SUV, MUV, Mini-bus, Bus, Truck and Other. Of these, 283k-316k consensus ground truth bounding boxes and labels were derived for distinct objects in the 26k images using Majority Voting and STAPLE algorithms. Further, we train multiple contemporary detectors, including YOLO11-S/X, RT-DETR-S/X, and DAMO-YOLO-T/L using these datasets, and report accuracy based on mAP50, mAP75 and mAP50:95. Models trained on UVH-26 achieve 8.4-31.5% improvements in mAP50:95 over equivalent baseline models trained on COCO dataset, with RT-DETR-X showing the best performance at 0.67 (mAP50:95) as compared to 0.40 for COCO-trained weights for common classes (Car, Bus, and Truck). This demonstrates the benefits of domain-specific training data for Indian traffic scenarios. The release package provides the 26k images with consensus annotations based on Majority Voting (UVH-26-MV) and STAPLE (UVH-26-ST) and the 6 fine-tuned YOLO and DETR models on each of these datasets. By capturing the heterogeneity of Indian urban mobility directly from operational traffic-camera streams, UVH-26 addresses a critical gap in existing global benchmarks, and offers a foundation for advancing detection, classification, and deployment of intelligent transportation systems in emerging nations with complex traffic conditions.

Related papers

PAVE: An End-to-End Dataset for Production Autonomous Vehicle Evaluation [11.024538259188347]
This dataset contains over 100 hours of naturalistic data from production autonomous-driving vehicle models in the market.<n>For each key frame, 20 Hz vehicle trajectories spanning the past 6 s and future 5 s are provided, along with detailed 2D annotations of surrounding vehicles, pedestrians, traffic lights, and traffic signs.<n>To evaluate the safety of AVs, we employ an end-to-end motion planning model that predicts vehicle trajectories with an Average Displacement Error (ADE) of 1.4 m on autonomous-driving frames.
arXiv Detail & Related papers (2025-11-18T06:41:34Z)
Evaluating YOLO Architectures: Implications for Real-Time Vehicle Detection in Urban Environments of Bangladesh [0.0]
Vehicle detection systems trained on Non-Bangladeshi datasets struggle to accurately identify local vehicle types in Bangladesh's unique road environments.<n>This study evaluates six YOLO model variants on a custom dataset featuring 29 distinct vehicle classes.
arXiv Detail & Related papers (2025-09-06T09:11:44Z)
DriveIndia: An Object Detection Dataset for Diverse Indian Traffic Scenes [0.3186130813218338]
DriveIndia is a large-scale object detection dataset purpose-built to capture the complexity and unpredictability of Indian traffic environments.<n>The dataset contains 66,986 high-resolution images annotated in YOLO format across 24 traffic-relevant object categories.
arXiv Detail & Related papers (2025-07-26T10:52:03Z)
AGC-Drive: A Large-Scale Dataset for Real-World Aerial-Ground Collaboration in Driving Scenarios [68.84774511206797]
We present AGC-Drive, the first large-scale real-world dataset for Aerial-Ground Cooperative 3D perception.<n>AGC-Drive contains 350 scenes, each with approximately 100 frames and fully annotated 3D bounding boxes covering 13 object categories.<n>We provide benchmarks for two 3D perception tasks: vehicle-to-vehicle collaborative perception and vehicle-to-Ground collaborative perception.
arXiv Detail & Related papers (2025-06-19T14:48:43Z)
myEye2Wheeler: A Two-Wheeler Indian Driver Real-World Eye-Tracking Dataset [0.0]
This paper presents the myEye2Wheeler dataset, a unique resource of real-world gaze behaviour of two-wheeler drivers.<n>Our dataset offers a critical lens into the unique visual attention patterns and insights into the decision-making of Indian two-wheeler drivers.
arXiv Detail & Related papers (2025-02-18T10:39:00Z)
V2X-Radar: A Multi-modal Dataset with 4D Radar for Cooperative Perception [61.58737390490639]
We present V2X-Radar, the first large-scale, real-world multi-modal dataset featuring 4D Radar.<n>The dataset consists of 20K LiDAR frames, 40K camera images, and 20K 4D Radar data, including 350K annotated boxes.<n>To support various research domains, we have established V2X-Radar-C for cooperative perception, V2X-Radar-I for roadside perception, and V2X-Radar-V for single-vehicle perception.
arXiv Detail & Related papers (2024-11-17T04:59:00Z)
Bangladeshi Native Vehicle Detection in Wild [1.444899524297657]
This paper proposes a native vehicle detection dataset for the most commonly appeared vehicle classes in Bangladesh. 17 distinct vehicle classes have been taken into account, with fully annotated 81542 instances of 17326 images. The experiments show that the BNVD dataset serves as a reliable representation of vehicle distribution.
arXiv Detail & Related papers (2024-05-20T16:23:40Z)
DeepAccident: A Motion and Accident Prediction Benchmark for V2X Autonomous Driving [76.29141888408265]
We propose a large-scale dataset containing diverse accident scenarios that frequently occur in real-world driving. The proposed DeepAccident dataset includes 57K annotated frames and 285K annotated samples, approximately 7 times more than the large-scale nuScenes dataset.
arXiv Detail & Related papers (2023-04-03T17:37:00Z)
Street-View Image Generation from a Bird's-Eye View Layout [95.36869800896335]
Bird's-Eye View (BEV) Perception has received increasing attention in recent years. Data-driven simulation for autonomous driving has been a focal point of recent research. We propose BEVGen, a conditional generative model that synthesizes realistic and spatially consistent surrounding images.
arXiv Detail & Related papers (2023-01-11T18:39:34Z)
One Million Scenes for Autonomous Driving: ONCE Dataset [91.94189514073354]
We introduce the ONCE dataset for 3D object detection in the autonomous driving scenario. The data is selected from 144 driving hours, which is 20x longer than the largest 3D autonomous driving dataset available. We reproduce and evaluate a variety of self-supervised and semi-supervised methods on the ONCE dataset.
arXiv Detail & Related papers (2021-06-21T12:28:08Z)
VehicleNet: Learning Robust Visual Representation for Vehicle Re-identification [116.1587709521173]
We propose to build a large-scale vehicle dataset (called VehicleNet) by harnessing four public vehicle datasets. We design a simple yet effective two-stage progressive approach to learning more robust visual representation from VehicleNet. We achieve the state-of-art accuracy of 86.07% mAP on the private test set of AICity Challenge.
arXiv Detail & Related papers (2020-04-14T05:06:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.