NEWSKVQA: Knowledge-Aware News Video Question Answering
- URL: http://arxiv.org/abs/2202.04015v1
- Date: Tue, 8 Feb 2022 17:31:31 GMT
- Title: NEWSKVQA: Knowledge-Aware News Video Question Answering
- Authors: Pranay Gupta and Manish Gupta
- Abstract summary: We explore a new frontier in video question answering: answering knowledge-based questions in the context of news videos.
We curate a new dataset of 12K news videos spanning across 156 hours with 1M multiple-choice question-answer pairs covering 8263 unique entities.
We propose a novel approach, NEWSKVQA which performs multi-modal inferencing over textual multiple-choice questions, videos, their transcripts and knowledge base.
- Score: 5.720640816755851
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Answering questions in the context of videos can be helpful in video
indexing, video retrieval systems, video summarization, learning management
systems and surveillance video analysis. Although there exists a large body of
work on visual question answering, work on video question answering (1) is
limited to domains like movies, TV shows, gameplay, or human activity, and (2)
is mostly based on common sense reasoning. In this paper, we explore a new
frontier in video question answering: answering knowledge-based questions in
the context of news videos. To this end, we curate a new dataset of 12K news
videos spanning across 156 hours with 1M multiple-choice question-answer pairs
covering 8263 unique entities. We make the dataset publicly available. Using
this dataset, we propose a novel approach, NEWSKVQA (Knowledge-Aware News Video
Question Answering) which performs multi-modal inferencing over textual
multiple-choice questions, videos, their transcripts and knowledge base, and
presents a strong baseline.
Related papers
- YTCommentQA: Video Question Answerability in Instructional Videos [22.673000779017595]
We present the YTCommentQA dataset, which contains naturally-generated questions from YouTube.
The dataset is categorized by their answerability and required modality to answer -- visual, script, or both.
arXiv Detail & Related papers (2024-01-30T14:18:37Z) - Locate before Answering: Answer Guided Question Localization for Video
Question Answering [70.38700123685143]
LocAns integrates a question locator and an answer predictor into an end-to-end model.
It achieves state-of-the-art performance on two modern long-term VideoQA datasets.
arXiv Detail & Related papers (2022-10-05T08:19:16Z) - WildQA: In-the-Wild Video Question Answering [22.065516207195323]
We propose WILDQA, a video understanding dataset of videos recorded in outside settings.
We also introduce the new task of identifying visual support for a given question and answer.
arXiv Detail & Related papers (2022-09-14T13:54:07Z) - Video Question Answering with Iterative Video-Text Co-Tokenization [77.66445727743508]
We propose a novel multi-stream video encoder for video question answering.
We experimentally evaluate the model on several datasets, such as MSRVTT-QA, MSVD-QA, IVQA.
Our model reduces the required GFLOPs from 150-360 to only 67, producing a highly efficient video question answering model.
arXiv Detail & Related papers (2022-08-01T15:35:38Z) - Video Question Answering: Datasets, Algorithms and Challenges [99.9179674610955]
Video Question Answering (VideoQA) aims to answer natural language questions according to the given videos.
This paper provides a clear taxonomy and comprehensive analyses to VideoQA, focusing on the datasets, algorithms, and unique challenges.
arXiv Detail & Related papers (2022-03-02T16:34:09Z) - VALUE: A Multi-Task Benchmark for Video-and-Language Understanding
Evaluation [124.02278735049235]
VALUE benchmark aims to cover a broad range of video genres, video lengths, data volumes, and task difficulty levels.
We evaluate various baseline methods with and without large-scale VidL pre-training.
The significant gap between our best model and human performance calls for future study for advanced VidL models.
arXiv Detail & Related papers (2021-06-08T18:34:21Z) - Just Ask: Learning to Answer Questions from Millions of Narrated Videos [97.44376735445454]
We propose to avoid manual annotation and generate a large-scale training dataset for video question answering.
We leverage a question generation transformer trained on text data and use it to generate question-answer pairs from transcribed video narrations.
We show our method to significantly outperform the state of the art on MSRVTT-QA, MSVD-QA, ActivityNet-QA and How2QA.
arXiv Detail & Related papers (2020-12-01T12:59:20Z) - Video Question Answering on Screencast Tutorials [43.00474548031818]
We introduce a dataset including question, answer and context triples from the tutorial videos for a software.
An one-shot recognition algorithm is designed to extract the visual cues, which helps enhance the performance of video question answering.
arXiv Detail & Related papers (2020-08-02T19:27:42Z) - Knowledge-Based Visual Question Answering in Videos [36.23723122336639]
We introduce KnowIT VQA, a video dataset with 24,282 human-generated question-answer pairs about a popular sitcom.
The dataset combines visual, textual and temporal coherence reasoning together with knowledge-based questions.
Our main findings are: (i) the incorporation of knowledge produces outstanding improvements for VQA in video, and (ii) the performance on KnowIT VQA still lags well behind human accuracy.
arXiv Detail & Related papers (2020-04-17T02:06:26Z) - VIOLIN: A Large-Scale Dataset for Video-and-Language Inference [103.7457132841367]
We introduce a new task, Video-and-Language Inference, for joint multimodal understanding of video and text.
Given a video clip with subtitles aligned as premise, paired with a natural language hypothesis based on the video content, a model needs to infer whether the hypothesis is entailed or contradicted by the given video clip.
A new large-scale dataset, named Violin (VIdeO-and-Language INference), is introduced for this task, which consists of 95,322 video-hypothesis pairs from 15,887 video clips.
arXiv Detail & Related papers (2020-03-25T20:39:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.