iReason: Multimodal Commonsense Reasoning using Videos and Natural
Language with Interpretability
- URL: http://arxiv.org/abs/2107.10300v1
- Date: Fri, 25 Jun 2021 02:56:34 GMT
- Title: iReason: Multimodal Commonsense Reasoning using Videos and Natural
Language with Interpretability
- Authors: Aman Chadha and Vinija Jain
- Abstract summary: Causality knowledge is vital to building robust AI systems.
We propose iReason, a framework that infers visual-semantic commonsense knowledge using both videos and natural language captions.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Causality knowledge is vital to building robust AI systems. Deep learning
models often perform poorly on tasks that require causal reasoning, which is
often derived using some form of commonsense knowledge not immediately
available in the input but implicitly inferred by humans. Prior work has
unraveled spurious observational biases that models fall prey to in the absence
of causality. While language representation models preserve contextual
knowledge within learned embeddings, they do not factor in causal relationships
during training. By blending causal relationships with the input features to an
existing model that performs visual cognition tasks (such as scene
understanding, video captioning, video question-answering, etc.), better
performance can be achieved owing to the insight causal relationships bring
about. Recently, several models have been proposed that have tackled the task
of mining causal data from either the visual or textual modality. However,
there does not exist widespread research that mines causal relationships by
juxtaposing the visual and language modalities. While images offer a rich and
easy-to-process resource for us to mine causality knowledge from, videos are
denser and consist of naturally time-ordered events. Also, textual information
offers details that could be implicit in videos. We propose iReason, a
framework that infers visual-semantic commonsense knowledge using both videos
and natural language captions. Furthermore, iReason's architecture integrates a
causal rationalization module to aid the process of interpretability, error
analysis and bias detection. We demonstrate the effectiveness of iReason using
a two-pronged comparative analysis with language representation learning models
(BERT, GPT-2) as well as current state-of-the-art multimodal causality models.
Related papers
- Towards Principled Representation Learning from Videos for Reinforcement Learning [23.877731515619868]
We study pre-training representations for decision-making using video data.
We focus on learning the latent state representations of the underlying MDP using video data.
arXiv Detail & Related papers (2024-03-20T17:28:17Z) - CommonsenseVIS: Visualizing and Understanding Commonsense Reasoning
Capabilities of Natural Language Models [30.63276809199399]
We present CommonsenseVIS, a visual explanatory system that utilizes external commonsense knowledge bases to contextualize model behavior for commonsense question-answering.
Our system features multi-level visualization and interactive model probing and editing for different concepts and their underlying relations.
arXiv Detail & Related papers (2023-07-23T17:16:13Z) - RECKONING: Reasoning through Dynamic Knowledge Encoding [51.076603338764706]
We show that language models can answer questions by reasoning over knowledge provided as part of the context.
In these situations, the model fails to distinguish the knowledge that is necessary to answer the question.
We propose teaching the model to reason more robustly by folding the provided contextual knowledge into the model's parameters.
arXiv Detail & Related papers (2023-05-10T17:54:51Z) - Causalainer: Causal Explainer for Automatic Video Summarization [77.36225634727221]
In many application scenarios, improper video summarization can have a large impact.
Modeling explainability is a key concern.
A Causal Explainer, dubbed Causalainer, is proposed to address this issue.
arXiv Detail & Related papers (2023-04-30T11:42:06Z) - The KITMUS Test: Evaluating Knowledge Integration from Multiple Sources
in Natural Language Understanding Systems [87.3207729953778]
We evaluate state-of-the-art coreference resolution models on our dataset.
Several models struggle to reason on-the-fly over knowledge observed both at pretrain time and at inference time.
Still, even the best performing models seem to have difficulties with reliably integrating knowledge presented only at inference time.
arXiv Detail & Related papers (2022-12-15T23:26:54Z) - Learning Contextual Causality from Time-consecutive Images [84.26437953699444]
Causality knowledge is crucial for many artificial intelligence systems.
In this paper, we investigate the possibility of learning contextual causality from the visual signal.
We first propose a high-quality dataset Vis-Causal and then conduct experiments to demonstrate that it is possible to automatically discover meaningful causal knowledge from the videos.
arXiv Detail & Related papers (2020-12-13T20:24:48Z) - Language Generation with Multi-Hop Reasoning on Commonsense Knowledge
Graph [124.45799297285083]
We argue that exploiting both the structural and semantic information of the knowledge graph facilitates commonsense-aware text generation.
We propose Generation with Multi-Hop Reasoning Flow (GRF) that enables pre-trained models with dynamic multi-hop reasoning on multi-relational paths extracted from the external commonsense knowledge graph.
arXiv Detail & Related papers (2020-09-24T13:55:32Z) - CausaLM: Causal Model Explanation Through Counterfactual Language Models [33.29636213961804]
CausaLM is a framework for producing causal model explanations using counterfactual language representation models.
We show that language representation models such as BERT can effectively learn a counterfactual representation for a given concept of interest.
A byproduct of our method is a language representation model that is unaffected by the tested concept.
arXiv Detail & Related papers (2020-05-27T15:06:35Z) - A Knowledge-Enhanced Pretraining Model for Commonsense Story Generation [98.25464306634758]
We propose to utilize commonsense knowledge from external knowledge bases to generate reasonable stories.
We employ multi-task learning which combines a discriminative objective to distinguish true and fake stories.
Our model can generate more reasonable stories than state-of-the-art baselines, particularly in terms of logic and global coherence.
arXiv Detail & Related papers (2020-01-15T05:42:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.