RefAV: Towards Planning-Centric Scenario Mining
- URL: http://arxiv.org/abs/2505.20981v2
- Date: Wed, 18 Jun 2025 20:32:38 GMT
- Title: RefAV: Towards Planning-Centric Scenario Mining
- Authors: Cainan Davidson, Deva Ramanan, Neehar Peri,
- Abstract summary: Traditional scenario mining techniques are error-prone and prohibitively time-consuming.<n>We introduce RefAV, a large-scale dataset of 10,000 diverse natural language queries.<n>We find that naively structured off-the-shelf-Ms yields poor performance.
- Score: 45.37155349405482
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Autonomous Vehicles (AVs) collect and pseudo-label terabytes of multi-modal data localized to HD maps during normal fleet testing. However, identifying interesting and safety-critical scenarios from uncurated driving logs remains a significant challenge. Traditional scenario mining techniques are error-prone and prohibitively time-consuming, often relying on hand-crafted structured queries. In this work, we revisit spatio-temporal scenario mining through the lens of recent vision-language models (VLMs) to detect whether a described scenario occurs in a driving log and, if so, precisely localize it in both time and space. To address this problem, we introduce RefAV, a large-scale dataset of 10,000 diverse natural language queries that describe complex multi-agent interactions relevant to motion planning derived from 1000 driving logs in the Argoverse 2 Sensor dataset. We evaluate several referential multi-object trackers and present an empirical analysis of our baselines. Notably, we find that naively repurposing off-the-shelf VLMs yields poor performance, suggesting that scenario mining presents unique challenges. Lastly, we discuss our recent CVPR 2025 competition and share insights from the community. Our code and dataset are available at https://github.com/CainanD/RefAV/ and https://argoverse.github.io/user-guide/tasks/scenario_mining.html
Related papers
- Why Braking? Scenario Extraction and Reasoning Utilizing LLM [13.88343221678386]
We propose a novel framework that leverages Large Language Model (LLM) for scenario understanding and reasoning.<n>Our method bridges the gap between low-level numerical signals and natural language descriptions, enabling LLM to interpret and classify driving scenarios.
arXiv Detail & Related papers (2025-07-17T08:33:56Z) - Spotting the Unexpected (STU): A 3D LiDAR Dataset for Anomaly Segmentation in Autonomous Driving [14.403130104985557]
This paper presents a novel dataset for anomaly segmentation in driving scenarios.<n>It is the first publicly available dataset focused on road anomaly segmentation with dense 3D semantic labeling.<n>Our dataset and evaluation code will be openly available, facilitating the testing and performance comparison of different approaches.
arXiv Detail & Related papers (2025-05-04T15:15:35Z) - Querying Labeled Time Series Data with Scenario Programs [0.0]
We propose a formal definition of what constitutes a match between a real-world labeled time series data item and a simulated scenario.
We present a definition and algorithm for matching scalable beyond the autonomous vehicles domain.
arXiv Detail & Related papers (2024-06-25T15:15:27Z) - NAVSIM: Data-Driven Non-Reactive Autonomous Vehicle Simulation and Benchmarking [65.24988062003096]
We present NAVSIM, a framework for benchmarking vision-based driving policies.
Our simulation is non-reactive, i.e., the evaluated policy and environment do not influence each other.
NAVSIM enabled a new competition held at CVPR 2024, where 143 teams submitted 463 entries, resulting in several new insights.
arXiv Detail & Related papers (2024-06-21T17:59:02Z) - DeTra: A Unified Model for Object Detection and Trajectory Forecasting [68.85128937305697]
Our approach formulates the union of the two tasks as a trajectory refinement problem.
To tackle this unified task, we design a refinement transformer that infers the presence, pose, and multi-modal future behaviors of objects.
In our experiments, we observe that ourmodel outperforms the state-of-the-art on Argoverse 2 Sensor and Open dataset.
arXiv Detail & Related papers (2024-06-06T18:12:04Z) - Graph Convolutional Networks for Complex Traffic Scenario Classification [0.7919810878571297]
A scenario-based testing approach can reduce the time required to obtain statistically significant evidence of the safety of Automated Driving Systems.
Most methods on scenario classification do not work for complex scenarios with diverse environments.
We propose a method for complex traffic scenario classification that is able to model the interaction of a vehicle with the environment.
arXiv Detail & Related papers (2023-10-26T20:51:24Z) - DeepAccident: A Motion and Accident Prediction Benchmark for V2X
Autonomous Driving [76.29141888408265]
We propose a large-scale dataset containing diverse accident scenarios that frequently occur in real-world driving.
The proposed DeepAccident dataset includes 57K annotated frames and 285K annotated samples, approximately 7 times more than the large-scale nuScenes dataset.
arXiv Detail & Related papers (2023-04-03T17:37:00Z) - CODA: A Real-World Road Corner Case Dataset for Object Detection in
Autonomous Driving [117.87070488537334]
We introduce a challenging dataset named CODA that exposes this critical problem of vision-based detectors.
The performance of standard object detectors trained on large-scale autonomous driving datasets significantly drops to no more than 12.8% in mAR.
We experiment with the state-of-the-art open-world object detector and find that it also fails to reliably identify the novel objects in CODA.
arXiv Detail & Related papers (2022-03-15T08:32:56Z) - Viewpoint-aware Progressive Clustering for Unsupervised Vehicle
Re-identification [36.60241974421236]
We propose a novel viewpoint-aware clustering algorithm for unsupervised vehicle Re-ID.
In particular, we first divide the entire feature space into different subspaces according to the predicted viewpoints and then perform a progressive clustering to mine the accurate relationship among samples.
arXiv Detail & Related papers (2020-11-18T05:40:14Z) - SoDA: Multi-Object Tracking with Soft Data Association [75.39833486073597]
Multi-object tracking (MOT) is a prerequisite for a safe deployment of self-driving cars.
We propose a novel approach to MOT that uses attention to compute track embeddings that encode dependencies between observed objects.
arXiv Detail & Related papers (2020-08-18T03:40:25Z) - When, Where, and What? A New Dataset for Anomaly Detection in Driving
Videos [9.638503179434581]
This paper proposes traffic anomaly detection with a textitwhen-where-what pipeline to detect, localize, and recognize anomalous events from egocentric videos.
We introduce a new dataset called Detection of Traffic Anomaly (DoTA) containing 4,677 videos with temporal, spatial, and categorical annotations.
arXiv Detail & Related papers (2020-04-06T23:58:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.