YTCommentQA: Video Question Answerability in Instructional Videos
- URL: http://arxiv.org/abs/2401.17343v1
- Date: Tue, 30 Jan 2024 14:18:37 GMT
- Title: YTCommentQA: Video Question Answerability in Instructional Videos
- Authors: Saelyne Yang, Sunghyun Park, Yunseok Jang, Moontae Lee
- Abstract summary: We present the YTCommentQA dataset, which contains naturally-generated questions from YouTube.
The dataset is categorized by their answerability and required modality to answer -- visual, script, or both.
- Score: 22.673000779017595
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Instructional videos provide detailed how-to guides for various tasks, with
viewers often posing questions regarding the content. Addressing these
questions is vital for comprehending the content, yet receiving immediate
answers is difficult. While numerous computational models have been developed
for Video Question Answering (Video QA) tasks, they are primarily trained on
questions generated based on video content, aiming to produce answers from
within the content. However, in real-world situations, users may pose questions
that go beyond the video's informational boundaries, highlighting the necessity
to determine if a video can provide the answer. Discerning whether a question
can be answered by video content is challenging due to the multi-modal nature
of videos, where visual and verbal information are intertwined. To bridge this
gap, we present the YTCommentQA dataset, which contains naturally-generated
questions from YouTube, categorized by their answerability and required
modality to answer -- visual, script, or both. Experiments with answerability
classification tasks demonstrate the complexity of YTCommentQA and emphasize
the need to comprehend the combined role of visual and script information in
video reasoning. The dataset is available at
https://github.com/lgresearch/YTCommentQA.
Related papers
- VideoQA in the Era of LLMs: An Empirical Study [108.37456450182054]
Video Large Language Models (Video-LLMs) are flourishing and has advanced many video-intuitive tasks.
This work conducts a timely and comprehensive study of Video-LLMs' behavior in VideoQA.
Our analyses demonstrate that Video-LLMs excel in VideoQA; they can correlate contextual cues and generate plausible responses to questions about varied video contents.
However, models falter in handling video temporality, both in reasoning about temporal content ordering and grounding QA-relevant temporal moments.
arXiv Detail & Related papers (2024-08-08T05:14:07Z) - MoVQA: A Benchmark of Versatile Question-Answering for Long-Form Movie
Understanding [69.04413943858584]
We introduce MoVQA, a long-form movie question-answering dataset.
We also benchmark to assess the diverse cognitive capabilities of multimodal systems.
arXiv Detail & Related papers (2023-12-08T03:33:38Z) - Video Question Answering with Iterative Video-Text Co-Tokenization [77.66445727743508]
We propose a novel multi-stream video encoder for video question answering.
We experimentally evaluate the model on several datasets, such as MSRVTT-QA, MSVD-QA, IVQA.
Our model reduces the required GFLOPs from 150-360 to only 67, producing a highly efficient video question answering model.
arXiv Detail & Related papers (2022-08-01T15:35:38Z) - Learning to Answer Visual Questions from Web Videos [89.71617065426146]
We propose to avoid manual annotation and generate a large-scale training dataset for video question answering.
We leverage a question generation transformer trained on text data and use it to generate question-answer pairs from transcribed video narrations.
For a detailed evaluation we introduce iVQA, a new VideoQA dataset with reduced language bias and high-quality manual annotations.
arXiv Detail & Related papers (2022-05-10T16:34:26Z) - Video Question Answering: Datasets, Algorithms and Challenges [99.9179674610955]
Video Question Answering (VideoQA) aims to answer natural language questions according to the given videos.
This paper provides a clear taxonomy and comprehensive analyses to VideoQA, focusing on the datasets, algorithms, and unique challenges.
arXiv Detail & Related papers (2022-03-02T16:34:09Z) - NEWSKVQA: Knowledge-Aware News Video Question Answering [5.720640816755851]
We explore a new frontier in video question answering: answering knowledge-based questions in the context of news videos.
We curate a new dataset of 12K news videos spanning across 156 hours with 1M multiple-choice question-answer pairs covering 8263 unique entities.
We propose a novel approach, NEWSKVQA which performs multi-modal inferencing over textual multiple-choice questions, videos, their transcripts and knowledge base.
arXiv Detail & Related papers (2022-02-08T17:31:31Z) - Just Ask: Learning to Answer Questions from Millions of Narrated Videos [97.44376735445454]
We propose to avoid manual annotation and generate a large-scale training dataset for video question answering.
We leverage a question generation transformer trained on text data and use it to generate question-answer pairs from transcribed video narrations.
We show our method to significantly outperform the state of the art on MSRVTT-QA, MSVD-QA, ActivityNet-QA and How2QA.
arXiv Detail & Related papers (2020-12-01T12:59:20Z) - Video Question Answering on Screencast Tutorials [43.00474548031818]
We introduce a dataset including question, answer and context triples from the tutorial videos for a software.
An one-shot recognition algorithm is designed to extract the visual cues, which helps enhance the performance of video question answering.
arXiv Detail & Related papers (2020-08-02T19:27:42Z) - Knowledge-Based Visual Question Answering in Videos [36.23723122336639]
We introduce KnowIT VQA, a video dataset with 24,282 human-generated question-answer pairs about a popular sitcom.
The dataset combines visual, textual and temporal coherence reasoning together with knowledge-based questions.
Our main findings are: (i) the incorporation of knowledge produces outstanding improvements for VQA in video, and (ii) the performance on KnowIT VQA still lags well behind human accuracy.
arXiv Detail & Related papers (2020-04-17T02:06:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.