Transferring Domain-Agnostic Knowledge in Video Question Answering
- URL: http://arxiv.org/abs/2110.13395v1
- Date: Tue, 26 Oct 2021 03:58:31 GMT
- Title: Transferring Domain-Agnostic Knowledge in Video Question Answering
- Authors: Tianran Wu, Noa Garcia, Mayu Otani, Chenhui Chu, Yuta Nakashima and
Haruo Takemura
- Abstract summary: Video question answering (VideoQA) is designed to answer a given question based on a relevant video clip.
In this paper, we investigate a transfer learning method by the introduction of domain-agnostic knowledge and domain-specific knowledge.
Our experiments show that: (i) domain-agnostic knowledge is transferable and (ii) our proposed transfer learning framework can boost VideoQA performance effectively.
- Score: 27.948768254771537
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video question answering (VideoQA) is designed to answer a given question
based on a relevant video clip. The current available large-scale datasets have
made it possible to formulate VideoQA as the joint understanding of visual and
language information. However, this training procedure is costly and still less
competent with human performance. In this paper, we investigate a transfer
learning method by the introduction of domain-agnostic knowledge and
domain-specific knowledge. First, we develop a novel transfer learning
framework, which finetunes the pre-trained model by applying domain-agnostic
knowledge as the medium. Second, we construct a new VideoQA dataset with 21,412
human-generated question-answer samples for comparable transfer of knowledge.
Our experiments show that: (i) domain-agnostic knowledge is transferable and
(ii) our proposed transfer learning framework can boost VideoQA performance
effectively.
Related papers
- Bridged-GNN: Knowledge Bridge Learning for Effective Knowledge Transfer [65.42096702428347]
Graph Neural Networks (GNNs) aggregate information from neighboring nodes.
Knowledge Bridge Learning (KBL) learns a knowledge-enhanced posterior distribution for target domains.
Bridged-GNN includes an Adaptive Knowledge Retrieval module to build Bridged-Graph and a Graph Knowledge Transfer module.
arXiv Detail & Related papers (2023-08-18T12:14:51Z) - Utilizing Background Knowledge for Robust Reasoning over Traffic
Situations [63.45021731775964]
We focus on a complementary research aspect of Intelligent Transportation: traffic understanding.
We scope our study to text-based methods and datasets given the abundant commonsense knowledge.
We adopt three knowledge-driven approaches for zero-shot QA over traffic situations.
arXiv Detail & Related papers (2022-12-04T09:17:24Z) - VLC-BERT: Visual Question Answering with Contextualized Commonsense
Knowledge [48.457788853408616]
We propose a method to generate, select, and encode external commonsense knowledge alongside visual and textual cues.
We show that VLC-BERT is capable of outperforming existing models that utilize static knowledge bases.
arXiv Detail & Related papers (2022-10-24T22:01:17Z) - Kformer: Knowledge Injection in Transformer Feed-Forward Layers [107.71576133833148]
We propose a novel knowledge fusion model, namely Kformer, which incorporates external knowledge through the feed-forward layer in Transformer.
We empirically find that simply injecting knowledge into FFN can facilitate the pre-trained language model's ability and facilitate current knowledge fusion methods.
arXiv Detail & Related papers (2022-01-15T03:00:27Z) - Unsupervised Domain Adaptation for Video Semantic Segmentation [91.30558794056054]
Unsupervised Domain Adaptation for semantic segmentation has gained immense popularity since it can transfer knowledge from simulation to real.
In this work, we present a new video extension of this task, namely Unsupervised Domain Adaptation for Video Semantic approaches.
We show that our proposals significantly outperform previous image-based UDA methods both on image-level (mIoU) and video-level (VPQ) evaluation metrics.
arXiv Detail & Related papers (2021-07-23T07:18:20Z) - Unsupervised Cross-Domain Prerequisite Chain Learning using Variational
Graph Autoencoders [2.735701323590668]
We propose unsupervised cross-domain concept prerequisite chain learning using an optimized variational graph autoencoder.
Our model learns to transfer concept prerequisite relations from an information-rich domain to an information-poor domain.
Also, we expand an existing dataset by introducing two new domains: CV and Bioinformatics.
arXiv Detail & Related papers (2021-05-07T21:02:41Z) - KRISP: Integrating Implicit and Symbolic Knowledge for Open-Domain
Knowledge-Based VQA [107.7091094498848]
One of the most challenging question types in VQA is when answering the question requires outside knowledge not present in the image.
In this work we study open-domain knowledge, the setting when the knowledge required to answer a question is not given/annotated, neither at training nor test time.
We tap into two types of knowledge representations and reasoning. First, implicit knowledge which can be learned effectively from unsupervised language pre-training and supervised training data with transformer-based models.
arXiv Detail & Related papers (2020-12-20T20:13:02Z) - Knowledge-Based Visual Question Answering in Videos [36.23723122336639]
We introduce KnowIT VQA, a video dataset with 24,282 human-generated question-answer pairs about a popular sitcom.
The dataset combines visual, textual and temporal coherence reasoning together with knowledge-based questions.
Our main findings are: (i) the incorporation of knowledge produces outstanding improvements for VQA in video, and (ii) the performance on KnowIT VQA still lags well behind human accuracy.
arXiv Detail & Related papers (2020-04-17T02:06:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.