Visual Question Generation in Bengali
- URL: http://arxiv.org/abs/2310.08187v1
- Date: Thu, 12 Oct 2023 10:26:26 GMT
- Title: Visual Question Generation in Bengali
- Authors: Mahmud Hasan, Labiba Islam, Jannatul Ferdous Ruma, Tasmiah Tahsin
Mayeesha, Rashedur M. Rahman
- Abstract summary: We develop a novel transformer-based encoder-decoder architecture that generates questions in Bengali when given an image.
We establish the first state of the art models for Visual Question Generation task in Bengali.
Our results show that our image-cat model achieves a BLUE-1 score of 33.12 and BLEU-3 score of 7.56.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The task of Visual Question Generation (VQG) is to generate human-like
questions relevant to the given image. As VQG is an emerging research field,
existing works tend to focus only on resource-rich language such as English due
to the availability of datasets. In this paper, we propose the first Bengali
Visual Question Generation task and develop a novel transformer-based
encoder-decoder architecture that generates questions in Bengali when given an
image. We propose multiple variants of models - (i) image-only: baseline model
of generating questions from images without additional information, (ii)
image-category and image-answer-category: guided VQG where we condition the
model to generate questions based on the answer and the category of expected
question. These models are trained and evaluated on the translated VQAv2.0
dataset. Our quantitative and qualitative results establish the first state of
the art models for VQG task in Bengali and demonstrate that our models are
capable of generating grammatically correct and relevant questions. Our
quantitative results show that our image-cat model achieves a BLUE-1 score of
33.12 and BLEU-3 score of 7.56 which is the highest of the other two variants.
We also perform a human evaluation to assess the quality of the generation
tasks. Human evaluation suggests that image-cat model is capable of generating
goal-driven and attribute-specific questions and also stays relevant to the
corresponding image.
Related papers
- Ask Questions with Double Hints: Visual Question Generation with Answer-awareness and Region-reference [107.53380946417003]
We propose a novel learning paradigm to generate visual questions with answer-awareness and region-reference.
We develop a simple methodology to self-learn the visual hints without introducing any additional human annotations.
arXiv Detail & Related papers (2024-07-06T15:07:32Z) - Generalizing Visual Question Answering from Synthetic to Human-Written Questions via a Chain of QA with a Large Language Model [4.41132900194195]
We propose a new method called it chain of QA for human-written questions (CoQAH)
CoQAH utilizes a sequence of QA interactions between a large language model and a VQA model trained on synthetic data to reason and derive logical answers for human-written questions.
We tested the effectiveness of CoQAH on two types of human-written VQA datasets for 3D-rendered and chest X-ray images.
arXiv Detail & Related papers (2024-01-12T06:49:49Z) - Davidsonian Scene Graph: Improving Reliability in Fine-grained Evaluation for Text-to-Image Generation [64.64849950642619]
We develop an evaluation framework inspired by formal semantics for evaluating text-to-image models.
We show that Davidsonian Scene Graph (DSG) produces atomic and unique questions organized in dependency graphs.
We also present DSG-1k, an open-sourced evaluation benchmark that includes 1,060 prompts.
arXiv Detail & Related papers (2023-10-27T16:20:10Z) - UNK-VQA: A Dataset and a Probe into the Abstention Ability of Multi-modal Large Models [55.22048505787125]
This paper contributes a comprehensive dataset, called UNK-VQA.
We first augment the existing data via deliberate perturbations on either the image or question.
We then extensively evaluate the zero- and few-shot performance of several emerging multi-modal large models.
arXiv Detail & Related papers (2023-10-17T02:38:09Z) - Localized Questions in Medical Visual Question Answering [2.005299372367689]
Visual Question Answering (VQA) models aim to answer natural language questions about given images.
Existing medical VQA models typically focus on answering questions that refer to an entire image.
This paper proposes a novel approach for medical VQA that addresses this limitation by developing a model that can answer questions about image regions.
arXiv Detail & Related papers (2023-07-03T14:47:18Z) - Generative Language Models for Paragraph-Level Question Generation [79.31199020420827]
Powerful generative models have led to recent progress in question generation (QG)
It is difficult to measure advances in QG research since there are no standardized resources that allow a uniform comparison among approaches.
We introduce QG-Bench, a benchmark for QG that unifies existing question answering datasets by converting them to a standard QG setting.
arXiv Detail & Related papers (2022-10-08T10:24:39Z) - K-VQG: Knowledge-aware Visual Question Generation for Common-sense
Acquisition [64.55573343404572]
We present a novel knowledge-aware VQG dataset called K-VQG.
This is the first large, humanly annotated dataset in which questions regarding images are tied to structured knowledge.
We also develop a new VQG model that can encode and use knowledge as the target for a question.
arXiv Detail & Related papers (2022-03-15T13:38:10Z) - Guiding Visual Question Generation [40.56637275354495]
In traditional Visual Question Generation (VQG), most images have multiple concepts for which a question could be generated.
We present Guiding Visual Question Generation - a variant of VQG which conditions the question generator on categorical information.
arXiv Detail & Related papers (2021-10-15T17:38:08Z) - C3VQG: Category Consistent Cyclic Visual Question Generation [51.339348810676896]
Visual Question Generation (VQG) is the task of generating natural questions based on an image.
In this paper, we try to exploit the different visual cues and concepts in an image to generate questions using a variational autoencoder (VAE) without ground-truth answers.
Our approach solves two major shortcomings of existing VQG systems: (i) minimize the level of supervision and (ii) replace generic questions with category relevant generations.
arXiv Detail & Related papers (2020-05-15T20:25:03Z) - Simplifying Paragraph-level Question Generation via Transformer Language
Models [0.0]
Question generation (QG) is a natural language generation task where a model is trained to ask questions corresponding to some input text.
A single Transformer-based unidirectional language model leveraging transfer learning can be used to produce high quality questions.
Our QG model, finetuned from GPT-2 Small, outperforms several paragraph-level QG baselines on the SQuAD dataset by 0.95 METEOR points.
arXiv Detail & Related papers (2020-05-03T14:57:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.