Towards a performance analysis on pre-trained Visual Question Answering
models for autonomous driving
- URL: http://arxiv.org/abs/2307.09329v2
- Date: Fri, 28 Jul 2023 09:50:23 GMT
- Title: Towards a performance analysis on pre-trained Visual Question Answering
models for autonomous driving
- Authors: Kaavya Rekanar, Ciar\'an Eising, Ganesh Sistu, Martin Hayes
- Abstract summary: This paper presents a preliminary analysis of three popular Visual Question Answering (VQA) models, namely ViLBERT, ViLT, and LXMERT.
The performance of these models is evaluated by comparing the similarity of responses to reference answers provided by computer vision experts.
- Score: 2.9552300389898094
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This short paper presents a preliminary analysis of three popular Visual
Question Answering (VQA) models, namely ViLBERT, ViLT, and LXMERT, in the
context of answering questions relating to driving scenarios. The performance
of these models is evaluated by comparing the similarity of responses to
reference answers provided by computer vision experts. Model selection is
predicated on the analysis of transformer utilization in multimodal
architectures. The results indicate that models incorporating cross-modal
attention and late fusion techniques exhibit promising potential for generating
improved answers within a driving perspective. This initial analysis serves as
a launchpad for a forthcoming comprehensive comparative study involving nine
VQA models and sets the scene for further investigations into the effectiveness
of VQA model queries in self-driving scenarios. Supplementary material is
available at
https://github.com/KaavyaRekanar/Towards-a-performance-analysis-on-pre-trained-VQA-models-for-autono mous-driving.
Related papers
Err
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.