IMKGA-SM: Interpretable Multimodal Knowledge Graph Answer Prediction via
Sequence Modeling
- URL: http://arxiv.org/abs/2301.02445v3
- Date: Tue, 10 Jan 2023 12:19:21 GMT
- Title: IMKGA-SM: Interpretable Multimodal Knowledge Graph Answer Prediction via
Sequence Modeling
- Authors: Yilin Wen, Biao Luo and Yuqian Zhao
- Abstract summary: Multimodal knowledge graph link prediction aims to improve the accuracy and efficiency of link prediction tasks for multimodal data.
New model is developed, namely Interpretable Multimodal Knowledge Graph Answer Prediction via Sequence Modeling (IMKGA-SM)
Model achieves much better performance than SOTA baselines on multimodal link prediction datasets of different sizes.
- Score: 3.867363075280544
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multimodal knowledge graph link prediction aims to improve the accuracy and
efficiency of link prediction tasks for multimodal data. However, for complex
multimodal information and sparse training data, it is usually difficult to
achieve interpretability and high accuracy simultaneously for most methods. To
address this difficulty, a new model is developed in this paper, namely
Interpretable Multimodal Knowledge Graph Answer Prediction via Sequence
Modeling (IMKGA-SM). First, a multi-modal fine-grained fusion method is
proposed, and Vgg16 and Optical Character Recognition (OCR) techniques are
adopted to effectively extract text information from images and images. Then,
the knowledge graph link prediction task is modelled as an offline
reinforcement learning Markov decision model, which is then abstracted into a
unified sequence framework. An interactive perception-based reward expectation
mechanism and a special causal masking mechanism are designed, which
``converts" the query into an inference path. Then, an autoregressive dynamic
gradient adjustment mechanism is proposed to alleviate the insufficient problem
of multimodal optimization. Finally, two datasets are adopted for experiments,
and the popular SOTA baselines are used for comparison. The results show that
the developed IMKGA-SM achieves much better performance than SOTA baselines on
multimodal link prediction datasets of different sizes.
Related papers
- Scalable Weibull Graph Attention Autoencoder for Modeling Document Networks [50.42343781348247]
We develop a graph Poisson factor analysis (GPFA) which provides analytic conditional posteriors to improve the inference accuracy.
We also extend GPFA to a multi-stochastic-layer version named graph Poisson gamma belief network (GPGBN) to capture the hierarchical document relationships at multiple semantic levels.
Our models can extract high-quality hierarchical latent document representations and achieve promising performance on various graph analytic tasks.
arXiv Detail & Related papers (2024-10-13T02:22:14Z) - MM-GTUNets: Unified Multi-Modal Graph Deep Learning for Brain Disorders Prediction [8.592259720470697]
We propose MM-GTUNets, an end-to-end graph transformer based multi-modal graph deep learning framework for brain disorders prediction.
We introduce Modality Reward Representation Learning (MRRL) which adaptively constructs population graphs using a reward system.
We also propose Adaptive Cross-Modal Graph Learning (ACMGL), which captures critical modality-specific and modality-shared features.
arXiv Detail & Related papers (2024-06-20T16:14:43Z) - SIG: Efficient Self-Interpretable Graph Neural Network for Continuous-time Dynamic Graphs [34.269958289295516]
We aim to predict future links within the dynamic graph while simultaneously providing causal explanations for these predictions.
To tackle these challenges, we propose a novel causal inference model, namely the Independent and Confounded Causal Model (ICCM)
Our proposed model significantly outperforms existing methods across link prediction accuracy, explanation quality, and robustness to shortcut features.
arXiv Detail & Related papers (2024-05-29T13:09:33Z) - Unleashing Network Potentials for Semantic Scene Completion [50.95486458217653]
This paper proposes a novel SSC framework - Adrial Modality Modulation Network (AMMNet)
AMMNet introduces two core modules: a cross-modal modulation enabling the interdependence of gradient flows between modalities, and a customized adversarial training scheme leveraging dynamic gradient competition.
Extensive experimental results demonstrate that AMMNet outperforms state-of-the-art SSC methods by a large margin.
arXiv Detail & Related papers (2024-03-12T11:48:49Z) - Exploiting Modality-Specific Features For Multi-Modal Manipulation
Detection And Grounding [54.49214267905562]
We construct a transformer-based framework for multi-modal manipulation detection and grounding tasks.
Our framework simultaneously explores modality-specific features while preserving the capability for multi-modal alignment.
We propose an implicit manipulation query (IMQ) that adaptively aggregates global contextual cues within each modality.
arXiv Detail & Related papers (2023-09-22T06:55:41Z) - Probabilistic MIMO U-Net: Efficient and Accurate Uncertainty Estimation
for Pixel-wise Regression [1.4528189330418977]
Uncertainty estimation in machine learning is paramount for enhancing the reliability and interpretability of predictive models.
We present an adaptation of the Multiple-Input Multiple-Output (MIMO) framework for pixel-wise regression tasks.
arXiv Detail & Related papers (2023-08-14T22:08:28Z) - Mutual Information Regularization for Weakly-supervised RGB-D Salient
Object Detection [33.210575826086654]
We present a weakly-supervised RGB-D salient object detection model via supervision.
We focus on effective multimodal representation learning via inter-modal mutual information regularization.
arXiv Detail & Related papers (2023-06-06T12:36:57Z) - Fusion with Hierarchical Graphs for Mulitmodal Emotion Recognition [7.147235324895931]
This paper proposes a novel hierarchical graph network (HFGCN) model that learns more informative multimodal representations.
Specifically, the proposed model fuses multimodality inputs using a two-stage graph construction approach and encodes the modality dependencies into the conversation representation.
Experiments showed the effectiveness of our proposed model for more accurate AER, which yielded state-of-the-art results on two public datasets.
arXiv Detail & Related papers (2021-09-15T08:21:01Z) - Adaptive Context-Aware Multi-Modal Network for Depth Completion [107.15344488719322]
We propose to adopt the graph propagation to capture the observed spatial contexts.
We then apply the attention mechanism on the propagation, which encourages the network to model the contextual information adaptively.
Finally, we introduce the symmetric gated fusion strategy to exploit the extracted multi-modal features effectively.
Our model, named Adaptive Context-Aware Multi-Modal Network (ACMNet), achieves the state-of-the-art performance on two benchmarks.
arXiv Detail & Related papers (2020-08-25T06:00:06Z) - A Novel Graph-based Multi-modal Fusion Encoder for Neural Machine
Translation [131.33610549540043]
We propose a novel graph-based multi-modal fusion encoder for NMT.
We first represent the input sentence and image using a unified multi-modal graph.
We then stack multiple graph-based multi-modal fusion layers that iteratively perform semantic interactions to learn node representations.
arXiv Detail & Related papers (2020-07-17T04:06:09Z) - Diversity inducing Information Bottleneck in Model Ensembles [73.80615604822435]
In this paper, we target the problem of generating effective ensembles of neural networks by encouraging diversity in prediction.
We explicitly optimize a diversity inducing adversarial loss for learning latent variables and thereby obtain diversity in the output predictions necessary for modeling multi-modal data.
Compared to the most competitive baselines, we show significant improvements in classification accuracy, under a shift in the data distribution.
arXiv Detail & Related papers (2020-03-10T03:10:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.