Rank2Tell: A Multimodal Driving Dataset for Joint Importance Ranking and
Reasoning
- URL: http://arxiv.org/abs/2309.06597v2
- Date: Wed, 8 Nov 2023 09:12:01 GMT
- Title: Rank2Tell: A Multimodal Driving Dataset for Joint Importance Ranking and
Reasoning
- Authors: Enna Sachdeva, Nakul Agarwal, Suhas Chundi, Sean Roelofs, Jiachen Li,
Mykel Kochenderfer, Chiho Choi, Behzad Dariush
- Abstract summary: This paper introduces a novel dataset, Rank2Tell, a multi-modal ego-centric dataset for Ranking the importance level and Telling the reason for the importance.
Using various close and open-ended visual question answering, the dataset provides dense annotations of various semantic, spatial, temporal, and relational attributes of various important objects in complex traffic scenarios.
- Score: 19.43430577960824
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The widespread adoption of commercial autonomous vehicles (AVs) and advanced
driver assistance systems (ADAS) may largely depend on their acceptance by
society, for which their perceived trustworthiness and interpretability to
riders are crucial. In general, this task is challenging because modern
autonomous systems software relies heavily on black-box artificial intelligence
models. Towards this goal, this paper introduces a novel dataset, Rank2Tell, a
multi-modal ego-centric dataset for Ranking the importance level and Telling
the reason for the importance. Using various close and open-ended visual
question answering, the dataset provides dense annotations of various semantic,
spatial, temporal, and relational attributes of various important objects in
complex traffic scenarios. The dense annotations and unique attributes of the
dataset make it a valuable resource for researchers working on visual scene
understanding and related fields. Furthermore, we introduce a joint model for
joint importance level ranking and natural language captions generation to
benchmark our dataset and demonstrate performance with quantitative
evaluations.
Related papers
- Web-Scale Visual Entity Recognition: An LLM-Driven Data Approach [56.55633052479446]
Web-scale visual entity recognition presents significant challenges due to the lack of clean, large-scale training data.
We propose a novel methodology to curate such a dataset, leveraging a multimodal large language model (LLM) for label verification, metadata generation, and rationale explanation.
Experiments demonstrate that models trained on this automatically curated data achieve state-of-the-art performance on web-scale visual entity recognition tasks.
arXiv Detail & Related papers (2024-10-31T06:55:24Z) - MMSci: A Dataset for Graduate-Level Multi-Discipline Multimodal Scientific Understanding [59.41495657570397]
This dataset includes figures such as schematic diagrams, simulated images, macroscopic/microscopic photos, and experimental visualizations.
We developed benchmarks for scientific figure captioning and multiple-choice questions, evaluating six proprietary and over ten open-source models.
The dataset and benchmarks will be released to support further research.
arXiv Detail & Related papers (2024-07-06T00:40:53Z) - A Survey on Autonomous Driving Datasets: Statistics, Annotation Quality, and a Future Outlook [24.691922611156937]
We present an exhaustive study of 265 autonomous driving datasets from multiple perspectives.
We introduce a novel metric to evaluate the impact of datasets, which can also be a guide for creating new datasets.
We discuss the current challenges and the development trend of the future autonomous driving datasets.
arXiv Detail & Related papers (2024-01-02T22:35:33Z) - Open-sourced Data Ecosystem in Autonomous Driving: the Present and Future [130.87142103774752]
This review systematically assesses over seventy open-source autonomous driving datasets.
It offers insights into various aspects, such as the principles underlying the creation of high-quality datasets.
It also delves into the scientific and technical challenges that warrant resolution.
arXiv Detail & Related papers (2023-12-06T10:46:53Z) - Modeling Entities as Semantic Points for Visual Information Extraction
in the Wild [55.91783742370978]
We propose an alternative approach to precisely and robustly extract key information from document images.
We explicitly model entities as semantic points, i.e., center points of entities are enriched with semantic information describing the attributes and relationships of different entities.
The proposed method can achieve significantly enhanced performance on entity labeling and linking, compared with previous state-of-the-art models.
arXiv Detail & Related papers (2023-03-23T08:21:16Z) - TRoVE: Transforming Road Scene Datasets into Photorealistic Virtual
Environments [84.6017003787244]
This work proposes a synthetic data generation pipeline to address the difficulties and domain-gaps present in simulated datasets.
We show that using annotations and visual cues from existing datasets, we can facilitate automated multi-modal data generation.
arXiv Detail & Related papers (2022-08-16T20:46:08Z) - Important Object Identification with Semi-Supervised Learning for
Autonomous Driving [37.654878298744855]
We propose a novel approach for important object identification in egocentric driving scenarios.
We present a semi-supervised learning pipeline to enable the model to learn from unlimited unlabeled data.
Our approach also outperforms rule-based baselines by a large margin.
arXiv Detail & Related papers (2022-03-05T01:23:13Z) - Diverse Complexity Measures for Dataset Curation in Self-driving [80.55417232642124]
We propose a new data selection method that exploits a diverse set of criteria that quantize interestingness of traffic scenes.
Our experiments show that the proposed curation pipeline is able to select datasets that lead to better generalization and higher performance.
arXiv Detail & Related papers (2021-01-16T23:45:02Z) - The Multimodal Sentiment Analysis in Car Reviews (MuSe-CaR) Dataset:
Collection, Insights and Improvements [14.707930573950787]
We present MuSe-CaR, a first of its kind multimodal dataset.
The data is publicly available as it recently served as the testing bed for the 1st Multimodal Sentiment Analysis Challenge.
arXiv Detail & Related papers (2021-01-15T10:40:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.