Related papers: RoadscapesQA: A Multitask, Multimodal Dataset for Visual Question Answering on Indian Roads

RoadscapesQA: A Multitask, Multimodal Dataset for Visual Question Answering on Indian Roads

URL: http://arxiv.org/abs/2602.12877v1
Date: Fri, 13 Feb 2026 12:27:31 GMT
Title: RoadscapesQA: A Multitask, Multimodal Dataset for Visual Question Answering on Indian Roads
Authors: Vijayasri Iyer, Maahin Rathinagiriswaran, Jyothikamalesh S,
Abstract summary: Roadscapes is a multitask dataset consisting of upto 9,000 images captured in diverse Indian driving environments.<n>To facilitate scalable scene understanding, we employ rule-baseds to infer various scene attributes.<n>Roadscapes has been curated to advance research on visual scene understanding in unstructured environments.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Understanding road scenes is essential for autonomous driving, as it enables systems to interpret visual surroundings to aid in effective decision-making. We present Roadscapes, a multitask multimodal dataset consisting of upto 9,000 images captured in diverse Indian driving environments, accompanied by manually verified bounding boxes. To facilitate scalable scene understanding, we employ rule-based heuristics to infer various scene attributes, which are subsequently used to generate question-answer (QA) pairs for tasks such as object grounding, reasoning, and scene understanding. The dataset includes a variety of scenes from urban and rural India, encompassing highways, service roads, village paths, and congested city streets, captured in both daytime and nighttime settings. Roadscapes has been curated to advance research on visual scene understanding in unstructured environments. In this paper, we describe the data collection and annotation process, present key dataset statistics, and provide initial baselines for image QA tasks using vision-language models.

Related papers

AVOID: The Adverse Visual Conditions Dataset with Obstacles for Driving Scene Understanding [48.97660297411286]
We introduce AVOID, a new dataset for real-time obstacle detection in a simulated environment.<n>AVOID consists of a large set of unexpected road obstacles located along each path captured under various weather and time conditions.<n>Each image is coupled with the corresponding semantic and depth maps, raw and semantic LiDAR data, and waypoints.
arXiv Detail & Related papers (2025-12-29T05:34:26Z)
RoadSceneVQA: Benchmarking Visual Question Answering in Roadside Perception Systems for Intelligent Transportation System [15.222742182076459]
RoadSceneVQA is a large-scale visual question answering dataset specifically tailored for roadside scenarios.<n>The dataset comprises 34,736 diverse QA pairs collected under varying weather, illumination, and traffic conditions.<n>RoadSceneVQA challenges models to perform both explicit recognition and implicit commonsense reasoning.
arXiv Detail & Related papers (2025-11-23T04:40:50Z)
ChatBEV: A Visual Language Model that Understands BEV Maps [58.3005092762598]
We introduce ChatBEV-QA, a novel BEV VQA benchmark containing over 137k questions.<n>This benchmark is constructed using a novel data collection pipeline that generates scalable and informative VQA data for BEV maps.<n>We propose a language-driven traffic scene generation pipeline, where ChatBEV facilitates map understanding and text-aligned navigation guidance.
arXiv Detail & Related papers (2025-03-18T06:12:38Z)
RSUD20K: A Dataset for Road Scene Understanding In Autonomous Driving [6.372000468173298]
RSUD20K is a new dataset for road scene understanding, comprised of over 20K high-resolution images from the driving perspective on Bangladesh roads. Our work significantly improves upon previous efforts, providing detailed annotations and increased object complexity.
arXiv Detail & Related papers (2024-01-14T16:10:42Z)
RSRD: A Road Surface Reconstruction Dataset and Benchmark for Safe and Comfortable Autonomous Driving [67.09546127265034]
Road surface reconstruction helps to enhance the analysis and prediction of vehicle responses for motion planning and control systems. We introduce the Road Surface Reconstruction dataset, a real-world, high-resolution, and high-precision dataset collected with a specialized platform in diverse driving conditions. It covers common road types containing approximately 16,000 pairs of stereo images, original point clouds, and ground-truth depth/disparity maps.
arXiv Detail & Related papers (2023-10-03T17:59:32Z)
Traffic Scene Parsing through the TSP6K Dataset [109.69836680564616]
We introduce a specialized traffic monitoring dataset, termed TSP6K, with high-quality pixel-level and instance-level annotations. The dataset captures more crowded traffic scenes with several times more traffic participants than the existing driving scenes. We propose a detail refining decoder for scene parsing, which recovers the details of different semantic regions in traffic scenes.
arXiv Detail & Related papers (2023-03-06T02:05:14Z)
Street-View Image Generation from a Bird's-Eye View Layout [95.36869800896335]
Bird's-Eye View (BEV) Perception has received increasing attention in recent years. Data-driven simulation for autonomous driving has been a focal point of recent research. We propose BEVGen, a conditional generative model that synthesizes realistic and spatially consistent surrounding images.
arXiv Detail & Related papers (2023-01-11T18:39:34Z)
Structured Bird's-Eye-View Traffic Scene Understanding from Onboard Images [128.881857704338]
We study the problem of extracting a directed graph representing the local road network in BEV coordinates, from a single onboard camera image. We show that the method can be extended to detect dynamic objects on the BEV plane. We validate our approach against powerful baselines and show that our network achieves superior performance.
arXiv Detail & Related papers (2021-10-05T12:40:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.