Tackling Data Bias in MUSIC-AVQA: Crafting a Balanced Dataset for
Unbiased Question-Answering
- URL: http://arxiv.org/abs/2310.06238v1
- Date: Tue, 10 Oct 2023 01:22:41 GMT
- Title: Tackling Data Bias in MUSIC-AVQA: Crafting a Balanced Dataset for
Unbiased Question-Answering
- Authors: Xiulong Liu and Zhikang Dong and Peng Zhang
- Abstract summary: We review each question type from the original dataset, selecting those with pronounced answer biases.
We gather complementary videos and questions, ensuring that no answers have outstanding skewed distribution.
We present a novel baseline model that delves deeper into the audio-visual-text interrelation.
- Score: 10.00386025149584
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, there has been a growing emphasis on the intersection of
audio, vision, and text modalities, driving forward the advancements in
multimodal research. However, strong bias that exists in any modality can lead
to the model neglecting the others. Consequently, the model's ability to
effectively reason across these diverse modalities is compromised, impeding
further advancement. In this paper, we meticulously review each question type
from the original dataset, selecting those with pronounced answer biases. To
counter these biases, we gather complementary videos and questions, ensuring
that no answers have outstanding skewed distribution. In particular, for binary
questions, we strive to ensure that both answers are almost uniformly spread
within each question category. As a result, we construct a new dataset, named
MUSIC-AVQA v2.0, which is more challenging and we believe could better foster
the progress of AVQA task. Furthermore, we present a novel baseline model that
delves deeper into the audio-visual-text interrelation. On MUSIC-AVQA v2.0,
this model surpasses all the existing benchmarks, improving accuracy by 2% on
MUSIC-AVQA v2.0, setting a new state-of-the-art performance.
Related papers
- Look, Listen, and Answer: Overcoming Biases for Audio-Visual Question Answering [25.577314828249897]
We propose a novel dataset, MUSIC-AVQA-R, crafted in two steps: rephrasing questions within the test split of a public dataset (MUSIC-AVQA) and introducing distribution shifts to split questions.
Experimental results show that this architecture achieves state-of-the-art performance on MUSIC-AVQA-R, notably obtaining a significant improvement of 9.32%.
arXiv Detail & Related papers (2024-04-18T09:16:02Z) - UNK-VQA: A Dataset and a Probe into the Abstention Ability of Multi-modal Large Models [55.22048505787125]
This paper contributes a comprehensive dataset, called UNK-VQA.
We first augment the existing data via deliberate perturbations on either the image or question.
We then extensively evaluate the zero- and few-shot performance of several emerging multi-modal large models.
arXiv Detail & Related papers (2023-10-17T02:38:09Z) - Learning to Answer Questions in Dynamic Audio-Visual Scenarios [81.19017026999218]
We focus on the Audio-Visual Questioning (AVQA) task, which aims to answer questions regarding different visual objects sounds, and their associations in videos.
Our dataset contains more than 45K question-answer pairs spanning over different modalities and question types.
Our results demonstrate that AVQA benefits from multisensory perception and our model outperforms recent A-SIC, V-SIC, and AVQA approaches.
arXiv Detail & Related papers (2022-03-26T13:03:42Z) - Challenges in Procedural Multimodal Machine Comprehension:A Novel Way To
Benchmark [14.50261153230204]
We focus on Multimodal Machine Reading (M3C) where a model is expected to answer questions based on given passage (or context)
We identify three critical biases stemming from the question-answer generation process and memorization capabilities of large deep models.
We propose a systematic framework to address these biases through three Control-Knobs.
arXiv Detail & Related papers (2021-10-22T16:33:57Z) - Learning with Instance Bundles for Reading Comprehension [61.823444215188296]
We introduce new supervision techniques that compare question-answer scores across multiple related instances.
Specifically, we normalize these scores across various neighborhoods of closely contrasting questions and/or answers.
We empirically demonstrate the effectiveness of training with instance bundles on two datasets.
arXiv Detail & Related papers (2021-04-18T06:17:54Z) - Contrast and Classify: Training Robust VQA Models [60.80627814762071]
We propose a novel training paradigm (ConClaT) that optimize both cross-entropy and contrastive losses.
We find that optimizing both losses -- either alternately or jointly -- is key to effective training.
arXiv Detail & Related papers (2020-10-13T00:23:59Z) - What Gives the Answer Away? Question Answering Bias Analysis on Video QA
Datasets [40.64071905569975]
Question answering biases in video QA datasets can mislead multimodal model to overfit to QA artifacts.
Our study shows biases can come from annotators and type of questions.
We also show empirically that using annotator-non-overlapping train-test splits can reduce QA biases for video QA datasets.
arXiv Detail & Related papers (2020-07-07T17:00:11Z) - Harvesting and Refining Question-Answer Pairs for Unsupervised QA [95.9105154311491]
We introduce two approaches to improve unsupervised Question Answering (QA)
First, we harvest lexically and syntactically divergent questions from Wikipedia to automatically construct a corpus of question-answer pairs (named as RefQA)
Second, we take advantage of the QA model to extract more appropriate answers, which iteratively refines data over RefQA.
arXiv Detail & Related papers (2020-05-06T15:56:06Z) - ManyModalQA: Modality Disambiguation and QA over Diverse Inputs [73.93607719921945]
We present a new multimodal question answering challenge, ManyModalQA, in which an agent must answer a question by considering three distinct modalities.
We collect our data by scraping Wikipedia and then utilize crowdsourcing to collect question-answer pairs.
arXiv Detail & Related papers (2020-01-22T14:39:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.