Linguistically Driven Graph Capsule Network for Visual Question
Reasoning
- URL: http://arxiv.org/abs/2003.10065v1
- Date: Mon, 23 Mar 2020 03:34:25 GMT
- Title: Linguistically Driven Graph Capsule Network for Visual Question
Reasoning
- Authors: Qingxing Cao and Xiaodan Liang and Keze Wang and Liang Lin
- Abstract summary: We propose a hierarchical compositional reasoning model called the "Linguistically driven Graph Capsule Network"
The compositional process is guided by the linguistic parse tree. Specifically, we bind each capsule in the lowest layer to bridge the linguistic embedding of a single word in the original question with visual evidence.
Experiments on the CLEVR dataset, CLEVR compositional generation test, and FigureQA dataset demonstrate the effectiveness and composition generalization ability of our end-to-end model.
- Score: 153.76012414126643
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, studies of visual question answering have explored various
architectures of end-to-end networks and achieved promising results on both
natural and synthetic datasets, which require explicitly compositional
reasoning. However, it has been argued that these black-box approaches lack
interpretability of results, and thus cannot perform well on generalization
tasks due to overfitting the dataset bias. In this work, we aim to combine the
benefits of both sides and overcome their limitations to achieve an end-to-end
interpretable structural reasoning for general images without the requirement
of layout annotations. Inspired by the property of a capsule network that can
carve a tree structure inside a regular convolutional neural network (CNN), we
propose a hierarchical compositional reasoning model called the "Linguistically
driven Graph Capsule Network", where the compositional process is guided by the
linguistic parse tree. Specifically, we bind each capsule in the lowest layer
to bridge the linguistic embedding of a single word in the original question
with visual evidence and then route them to the same capsule if they are
siblings in the parse tree. This compositional process is achieved by
performing inference on a linguistically driven conditional random field (CRF)
and is performed across multiple graph capsule layers, which results in a
compositional reasoning process inside a CNN. Experiments on the CLEVR dataset,
CLEVR compositional generation test, and FigureQA dataset demonstrate the
effectiveness and composition generalization ability of our end-to-end model.
Related papers
- Hypotheses Tree Building for One-Shot Temporal Sentence Localization [53.82714065005299]
One-shot temporal sentence localization (one-shot TSL) learns to retrieve the query information among the entire video with only one annotated frame.
We propose an effective and novel tree-structure baseline for one-shot TSL, called Multiple Hypotheses Segment Tree (MHST)
MHST captures the query-aware discriminative frame-wise information under the insufficient annotations.
arXiv Detail & Related papers (2023-01-05T01:50:43Z) - PGX: A Multi-level GNN Explanation Framework Based on Separate Knowledge
Distillation Processes [0.2005299372367689]
We propose a multi-level GNN explanation framework based on an observation that GNN is a multimodal learning process of multiple components in graph data.
The complexity of the original problem is relaxed by breaking into multiple sub-parts represented as a hierarchical structure.
We also aim for personalized explanations as the framework can generate different results based on user preferences.
arXiv Detail & Related papers (2022-08-05T10:14:48Z) - TREE-G: Decision Trees Contesting Graph Neural Networks [33.364191419692105]
TREE-G modifies standard decision trees, by introducing a novel split function that is specialized for graph data.
We show that TREE-G consistently outperforms other tree-based models and often outperforms other graph-learning algorithms such as Graph Neural Networks (GNNs) and Graph Kernels.
arXiv Detail & Related papers (2022-07-06T15:53:17Z) - Investigating Neural Architectures by Synthetic Dataset Design [14.317837518705302]
Recent years have seen the emergence of many new neural network structures (architectures and layers)
We sketch a methodology to measure the effect of each structure on a network's ability, by designing ad hoc synthetic datasets.
We illustrate our methodology by building three datasets to evaluate each of the three following network properties.
arXiv Detail & Related papers (2022-04-23T10:50:52Z) - Explicit Pairwise Factorized Graph Neural Network for Semi-Supervised
Node Classification [59.06717774425588]
We propose the Explicit Pairwise Factorized Graph Neural Network (EPFGNN), which models the whole graph as a partially observed Markov Random Field.
It contains explicit pairwise factors to model output-output relations and uses a GNN backbone to model input-output relations.
We conduct experiments on various datasets, which shows that our model can effectively improve the performance for semi-supervised node classification on graphs.
arXiv Detail & Related papers (2021-07-27T19:47:53Z) - Learning the Implicit Semantic Representation on Graph-Structured Data [57.670106959061634]
Existing representation learning methods in graph convolutional networks are mainly designed by describing the neighborhood of each node as a perceptual whole.
We propose a Semantic Graph Convolutional Networks (SGCN) that explores the implicit semantics by learning latent semantic-paths in graphs.
arXiv Detail & Related papers (2021-01-16T16:18:43Z) - Can RNNs learn Recursive Nested Subject-Verb Agreements? [4.094098809740732]
Language processing requires the ability to extract nested tree structures.
Recent advances in Recurrent Neural Networks (RNNs) achieve near-human performance in some language tasks.
arXiv Detail & Related papers (2021-01-06T20:47:02Z) - Hierarchical Graph Capsule Network [78.4325268572233]
We propose hierarchical graph capsule network (HGCN) that can jointly learn node embeddings and extract graph hierarchies.
To learn the hierarchical representation, HGCN characterizes the part-whole relationship between lower-level capsules (part) and higher-level capsules (whole)
arXiv Detail & Related papers (2020-12-16T04:13:26Z) - Interpretable Neural Computation for Real-World Compositional Visual
Question Answering [4.3668650778541895]
We build an interpretable framework for real-world compositional VQA.
In our framework, images and questions are disentangled into scene graphs and programs, and a symbolic program runs on them with full transparency to select the attention regions.
Experiments conducted on the GQA benchmark demonstrate that our framework achieves the compositional prior arts and competitive accuracy among monolithic ones.
arXiv Detail & Related papers (2020-10-10T05:46:22Z) - Improving Graph Neural Network Expressivity via Subgraph Isomorphism
Counting [63.04999833264299]
"Graph Substructure Networks" (GSN) is a topologically-aware message passing scheme based on substructure encoding.
We show that it is strictly more expressive than the Weisfeiler-Leman (WL) graph isomorphism test.
We perform an extensive evaluation on graph classification and regression tasks and obtain state-of-the-art results in diverse real-world settings.
arXiv Detail & Related papers (2020-06-16T15:30:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.