Learning Contextual Causality from Time-consecutive Images
- URL: http://arxiv.org/abs/2012.07138v1
- Date: Sun, 13 Dec 2020 20:24:48 GMT
- Title: Learning Contextual Causality from Time-consecutive Images
- Authors: Hongming Zhang, Yintong Huo, Xinran Zhao, Yangqiu Song, Dan Roth
- Abstract summary: Causality knowledge is crucial for many artificial intelligence systems.
In this paper, we investigate the possibility of learning contextual causality from the visual signal.
We first propose a high-quality dataset Vis-Causal and then conduct experiments to demonstrate that it is possible to automatically discover meaningful causal knowledge from the videos.
- Score: 84.26437953699444
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Causality knowledge is crucial for many artificial intelligence systems.
Conventional textual-based causality knowledge acquisition methods typically
require laborious and expensive human annotations. As a result, their scale is
often limited. Moreover, as no context is provided during the annotation, the
resulting causality knowledge records (e.g., ConceptNet) typically do not take
the context into consideration. To explore a more scalable way of acquiring
causality knowledge, in this paper, we jump out of the textual domain and
investigate the possibility of learning contextual causality from the visual
signal. Compared with pure text-based approaches, learning causality from the
visual signal has the following advantages: (1) Causality knowledge belongs to
the commonsense knowledge, which is rarely expressed in the text but rich in
videos; (2) Most events in the video are naturally time-ordered, which provides
a rich resource for us to mine causality knowledge from; (3) All the objects in
the video can be used as context to study the contextual property of causal
relations. In detail, we first propose a high-quality dataset Vis-Causal and
then conduct experiments to demonstrate that with good language and visual
representation models as well as enough training signals, it is possible to
automatically discover meaningful causal knowledge from the videos. Further
analysis also shows that the contextual property of causal relations indeed
exists, taking which into consideration might be crucial if we want to use the
causality knowledge in real applications, and the visual signal could serve as
a good resource for learning such contextual causality.
Related papers
- SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World Knowledge [60.76719375410635]
We propose a new benchmark (SOK-Bench) consisting of 44K questions and 10K situations with instance-level annotations depicted in the videos.
The reasoning process is required to understand and apply situated knowledge and general knowledge for problem-solving.
We generate associated question-answer pairs and reasoning processes, finally followed by manual reviews for quality assurance.
arXiv Detail & Related papers (2024-05-15T21:55:31Z) - EventGround: Narrative Reasoning by Grounding to Eventuality-centric Knowledge Graphs [41.928535719157054]
We propose an initial comprehensive framework called EventGround to tackle the problem of grounding free-texts to eventuality-centric knowledge graphs.
We provide simple yet effective parsing and partial information extraction methods to tackle these problems.
Our framework, incorporating grounded knowledge, achieves state-of-the-art performance while providing interpretable evidence.
arXiv Detail & Related papers (2024-03-30T01:16:37Z) - Comprehensive Event Representations using Event Knowledge Graphs and
Natural Language Processing [0.0]
This work seeks to utilise and build on the growing body of work that uses findings from the field of natural language processing (NLP) to extract knowledge from text and build knowledge graphs.
Specifically, sub-event extraction is used as a way of creating sub-event-aware event representations.
These event representations are enriched through fine-grained location extraction and contextualised through the alignment of historically relevant quotes.
arXiv Detail & Related papers (2023-03-08T18:43:39Z) - Visually Grounded Commonsense Knowledge Acquisition [132.42003872906062]
Large-scale commonsense knowledge bases empower a broad range of AI applications.
Visual perception contains rich commonsense knowledge about real-world entities.
We present CLEVER, which formulates CKE as a distantly supervised multi-instance learning problem.
arXiv Detail & Related papers (2022-11-22T07:00:16Z) - Leveraging Visual Knowledge in Language Tasks: An Empirical Study on
Intermediate Pre-training for Cross-modal Knowledge Transfer [61.34424171458634]
We study whether integrating visual knowledge into a language model can fill the gap.
Our experiments show that visual knowledge transfer can improve performance in both low-resource and fully supervised settings.
arXiv Detail & Related papers (2022-03-14T22:02:40Z) - Generated Knowledge Prompting for Commonsense Reasoning [53.88983683513114]
We propose generating knowledge statements directly from a language model with a generic prompt format.
This approach improves performance of both off-the-shelf and finetuned language models on four commonsense reasoning tasks.
Notably, we find that a model's predictions can improve when using its own generated knowledge.
arXiv Detail & Related papers (2021-10-15T21:58:03Z) - iReason: Multimodal Commonsense Reasoning using Videos and Natural
Language with Interpretability [0.0]
Causality knowledge is vital to building robust AI systems.
We propose iReason, a framework that infers visual-semantic commonsense knowledge using both videos and natural language captions.
arXiv Detail & Related papers (2021-06-25T02:56:34Z) - Dimensions of Commonsense Knowledge [60.49243784752026]
We survey a wide range of popular commonsense sources with a special focus on their relations.
We consolidate these relations into 13 knowledge dimensions, each abstracting over more specific relations found in sources.
arXiv Detail & Related papers (2021-01-12T17:52:39Z) - KVL-BERT: Knowledge Enhanced Visual-and-Linguistic BERT for Visual
Commonsense Reasoning [4.787501955202053]
In visual commonsense reasoning (VCR) task, a machine must answer correctly and then provide a rationale justifying its answer.
We propose a novel Knowledge Enhanced Visual-and-Linguistic BERT (KVL-BERT for short) model.
Besides taking visual and linguistic contents as input, external commonsense knowledge extracted from ConceptNet is integrated into the multi-layer Transformer.
arXiv Detail & Related papers (2020-12-13T08:22:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.