Dynamic Language Binding in Relational Visual Reasoning
- URL: http://arxiv.org/abs/2004.14603v3
- Date: Thu, 18 Feb 2021 03:35:24 GMT
- Title: Dynamic Language Binding in Relational Visual Reasoning
- Authors: Thao Minh Le, Vuong Le, Svetha Venkatesh, Truyen Tran
- Abstract summary: We present Language-binding Object Graph Network, the first neural reasoning method with dynamic relational structures across both visual and textual domains.
Our method outperforms other methods in sophisticated question-answering tasks wherein multiple object relations are involved.
- Score: 67.85579756590478
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present Language-binding Object Graph Network, the first neural reasoning
method with dynamic relational structures across both visual and textual
domains with applications in visual question answering. Relaxing the common
assumption made by current models that the object predicates pre-exist and stay
static, passive to the reasoning process, we propose that these dynamic
predicates expand across the domain borders to include pair-wise
visual-linguistic object binding. In our method, these contextualized object
links are actively found within each recurrent reasoning step without relying
on external predicative priors. These dynamic structures reflect the
conditional dual-domain object dependency given the evolving context of the
reasoning through co-attention. Such discovered dynamic graphs facilitate
multi-step knowledge combination and refinements that iteratively deduce the
compact representation of the final answer. The effectiveness of this model is
demonstrated on image question answering demonstrating favorable performance on
major VQA datasets. Our method outperforms other methods in sophisticated
question-answering tasks wherein multiple object relations are involved. The
graph structure effectively assists the progress of training, and therefore the
network learns efficiently compared to other reasoning models.
Related papers
- Conversational Semantic Parsing using Dynamic Context Graphs [68.72121830563906]
We consider the task of conversational semantic parsing over general purpose knowledge graphs (KGs) with millions of entities, and thousands of relation-types.
We focus on models which are capable of interactively mapping user utterances into executable logical forms.
arXiv Detail & Related papers (2023-05-04T16:04:41Z) - SA-VQA: Structured Alignment of Visual and Semantic Representations for
Visual Question Answering [29.96818189046649]
We propose to apply structured alignments, which work with graph representation of visual and textual content.
As demonstrated in our experimental results, such a structured alignment improves reasoning performance.
The proposed model, without any pretraining, outperforms the state-of-the-art methods on GQA dataset, and beats the non-pretrained state-of-the-art methods on VQA-v2 dataset.
arXiv Detail & Related papers (2022-01-25T22:26:09Z) - Hierarchical Object-oriented Spatio-Temporal Reasoning for Video
Question Answering [27.979053252431306]
Video Question Answering (Video QA) is a powerful testbed to develop new AI capabilities.
We propose an object-oriented reasoning approach in that video is abstracted as a dynamic stream of interacting objects.
This mechanism is materialized into a family of general-purpose neural units and their multi-level architecture.
arXiv Detail & Related papers (2021-06-25T05:12:42Z) - Object-Centric Representation Learning for Video Question Answering [27.979053252431306]
Video answering (Video QA) presents a powerful testbed for human-like intelligent behaviors.
The task demands new capabilities to integrate processing, language understanding, binding abstract concepts to concrete visual artifacts.
We propose a new query-guided representation framework to turn a video into a relational graph of objects.
arXiv Detail & Related papers (2021-04-12T02:37:20Z) - Unified Graph Structured Models for Video Understanding [93.72081456202672]
We propose a message passing graph neural network that explicitly models relational-temporal relations.
We show how our method is able to more effectively model relationships between relevant entities in the scene.
arXiv Detail & Related papers (2021-03-29T14:37:35Z) - A Graph-based Interactive Reasoning for Human-Object Interaction
Detection [71.50535113279551]
We present a novel graph-based interactive reasoning model called Interactive Graph (abbr. in-Graph) to infer HOIs.
We construct a new framework to assemble in-Graph models for detecting HOIs, namely in-GraphNet.
Our framework is end-to-end trainable and free from costly annotations like human pose.
arXiv Detail & Related papers (2020-07-14T09:29:03Z) - ORD: Object Relationship Discovery for Visual Dialogue Generation [60.471670447176656]
We propose an object relationship discovery (ORD) framework to preserve the object interactions for visual dialogue generation.
A hierarchical graph convolutional network (HierGCN) is proposed to retain the object nodes and neighbour relationships locally, and then refines the object-object connections globally.
Experiments have proved that the proposed method can significantly improve the quality of dialogue by utilising the contextual information of visual relationships.
arXiv Detail & Related papers (2020-06-15T12:25:40Z) - A Dependency Syntactic Knowledge Augmented Interactive Architecture for
End-to-End Aspect-based Sentiment Analysis [73.74885246830611]
We propose a novel dependency syntactic knowledge augmented interactive architecture with multi-task learning for end-to-end ABSA.
This model is capable of fully exploiting the syntactic knowledge (dependency relations and types) by leveraging a well-designed Dependency Relation Embedded Graph Convolutional Network (DreGcn)
Extensive experimental results on three benchmark datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2020-04-04T14:59:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.