NuScenes-MQA: Integrated Evaluation of Captions and QA for Autonomous
Driving Datasets using Markup Annotations
- URL: http://arxiv.org/abs/2312.06352v1
- Date: Mon, 11 Dec 2023 12:58:54 GMT
- Title: NuScenes-MQA: Integrated Evaluation of Captions and QA for Autonomous
Driving Datasets using Markup Annotations
- Authors: Yuichi Inoue, Yuki Yada, Kotaro Tanahashi, Yu Yamaguchi
- Abstract summary: Visual Question Answering (VQA) is one of the most important tasks in autonomous driving.
We introduce a novel dataset annotation technique in which QAs are enclosed within markups.
This dataset empowers the development of vision language models, especially for autonomous driving tasks.
- Score: 0.6827423171182154
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual Question Answering (VQA) is one of the most important tasks in
autonomous driving, which requires accurate recognition and complex situation
evaluations. However, datasets annotated in a QA format, which guarantees
precise language generation and scene recognition from driving scenes, have not
been established yet. In this work, we introduce Markup-QA, a novel dataset
annotation technique in which QAs are enclosed within markups. This approach
facilitates the simultaneous evaluation of a model's capabilities in sentence
generation and VQA. Moreover, using this annotation methodology, we designed
the NuScenes-MQA dataset. This dataset empowers the development of vision
language models, especially for autonomous driving tasks, by focusing on both
descriptive capabilities and precise QA. The dataset is available at
https://github.com/turingmotors/NuScenes-MQA.
Related papers
- Grounding-IQA: Multimodal Language Grounding Model for Image Quality Assessment [69.07445098168344]
We introduce a new image quality assessment (IQA) task paradigm, grounding-IQA.
Grounding-IQA comprises two subtasks: grounding-IQA-description (GIQA-DES) and visual question answering (GIQA-VQA)
To realize grounding-IQA, we construct a corresponding dataset, GIQA-160K, through our proposed automated annotation pipeline.
Experiments demonstrate that our proposed task paradigm, dataset, and benchmark facilitate the more fine-grained IQA application.
arXiv Detail & Related papers (2024-11-26T09:03:16Z) - Suvach -- Generated Hindi QA benchmark [0.0]
This paper proposes a new benchmark specifically designed for evaluating Hindi EQA models.
This method leverages large language models (LLMs) to generate a high-quality dataset in an extractive setting.
arXiv Detail & Related papers (2024-04-30T04:19:17Z) - AQUALLM: Audio Question Answering Data Generation Using Large Language
Models [2.2232550112727267]
We introduce a scalable AQA data generation pipeline, which relies on Large Language Models (LLMs)
We present three extensive and high-quality benchmark datasets for AQA.
Models trained on our datasets demonstrate enhanced generalizability when compared to models trained using human-annotated AQA data.
arXiv Detail & Related papers (2023-12-28T20:01:27Z) - UNK-VQA: A Dataset and a Probe into the Abstention Ability of Multi-modal Large Models [55.22048505787125]
This paper contributes a comprehensive dataset, called UNK-VQA.
We first augment the existing data via deliberate perturbations on either the image or question.
We then extensively evaluate the zero- and few-shot performance of several emerging multi-modal large models.
arXiv Detail & Related papers (2023-10-17T02:38:09Z) - NuScenes-QA: A Multi-modal Visual Question Answering Benchmark for
Autonomous Driving Scenario [77.14723238359318]
NuScenesQA is the first benchmark for VQA in the autonomous driving scenario, encompassing 34K visual scenes and 460K question-answer pairs.
We leverage existing 3D detection annotations to generate scene graphs and design question templates manually.
We develop a series of baselines that employ advanced 3D detection and VQA techniques.
arXiv Detail & Related papers (2023-05-24T07:40:50Z) - PAXQA: Generating Cross-lingual Question Answering Examples at Training
Scale [53.92008514395125]
PAXQA (Projecting annotations for cross-lingual (x) QA) decomposes cross-lingual QA into two stages.
We propose a novel use of lexically-constrained machine translation, in which constrained entities are extracted from the parallel bitexts.
We show that models fine-tuned on these datasets outperform prior synthetic data generation models over several extractive QA datasets.
arXiv Detail & Related papers (2023-04-24T15:46:26Z) - Generating Diverse and Consistent QA pairs from Contexts with
Information-Maximizing Hierarchical Conditional VAEs [62.71505254770827]
We propose a conditional variational autoencoder (HCVAE) for generating QA pairs given unstructured texts as contexts.
Our model obtains impressive performance gains over all baselines on both tasks, using only a fraction of data for training.
arXiv Detail & Related papers (2020-05-28T08:26:06Z) - Template-Based Question Generation from Retrieved Sentences for Improved
Unsupervised Question Answering [98.48363619128108]
We propose an unsupervised approach to training QA models with generated pseudo-training data.
We show that generating questions for QA training by applying a simple template on a related, retrieved sentence rather than the original context sentence improves downstream QA performance.
arXiv Detail & Related papers (2020-04-24T17:57:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.