Zero-Shot Relational Learning for Multimodal Knowledge Graphs
- URL: http://arxiv.org/abs/2404.06220v1
- Date: Tue, 9 Apr 2024 11:14:45 GMT
- Title: Zero-Shot Relational Learning for Multimodal Knowledge Graphs
- Authors: Rui Cai, Shichao Pei, Xiangliang Zhang,
- Abstract summary: One of the major challenges is inference on newly discovered relations without associated training data.
Existing works fail to support the leverage of multimodal information and leave the problem unexplored.
We propose a novel end-to-end framework, consisting of three components, i.e., multimodal learner, structure consolidator embedding generator, to integrate diverse multimodal information and knowledge graph structures.
- Score: 31.215889061734295
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Relational learning is an essential task in the domain of knowledge representation, particularly in knowledge graph completion (KGC).While relational learning in traditional single-modal settings has been extensively studied, exploring it within a multimodal KGC context presents distinct challenges and opportunities. One of the major challenges is inference on newly discovered relations without any associated training data. This zero-shot relational learning scenario poses unique requirements for multimodal KGC, i.e., utilizing multimodality to facilitate relational learning. However, existing works fail to support the leverage of multimodal information and leave the problem unexplored. In this paper, we propose a novel end-to-end framework, consisting of three components, i.e., multimodal learner, structure consolidator, and relation embedding generator, to integrate diverse multimodal information and knowledge graph structures to facilitate the zero-shot relational learning. Evaluation results on two multimodal knowledge graphs demonstrate the superior performance of our proposed method.
Related papers
- Harnessing Shared Relations via Multimodal Mixup Contrastive Learning for Multimodal Classification [3.6616868775630587]
We propose M3CoL, a Multimodal Mixup Contrastive Learning approach to capture nuanced shared relations inherent in multimodal data.
Our work highlights the significance of learning shared relations for robust multimodal learning, opening up promising avenues for future research.
arXiv Detail & Related papers (2024-09-26T12:15:13Z) - PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal Documents [58.35492519636351]
PIN format is built on three foundational principles: knowledge intensity, scalability, and support for diverse training modalities.
We present PIN-14M, an open-source dataset comprising 14 million samples derived from a diverse range of Chinese and English sources.
arXiv Detail & Related papers (2024-06-20T01:43:08Z) - Multiple Heads are Better than One: Mixture of Modality Knowledge Experts for Entity Representation Learning [51.80447197290866]
Learning high-quality multi-modal entity representations is an important goal of multi-modal knowledge graph (MMKG) representation learning.
Existing methods focus on crafting elegant entity-wise multi-modal fusion strategies.
We introduce a novel framework with Mixture of Modality Knowledge experts (MoMoK) to learn adaptive multi-modal entity representations.
arXiv Detail & Related papers (2024-05-27T06:36:17Z) - Beyond Unimodal Learning: The Importance of Integrating Multiple Modalities for Lifelong Learning [23.035725779568587]
We study the role and interactions of multiple modalities in mitigating forgetting in deep neural networks (DNNs)
Our findings demonstrate that leveraging multiple views and complementary information from multiple modalities enables the model to learn more accurate and robust representations.
We propose a method for integrating and aligning the information from different modalities by utilizing the relational structural similarities between the data points in each modality.
arXiv Detail & Related papers (2024-05-04T22:02:58Z) - Generative Multi-Modal Knowledge Retrieval with Large Language Models [75.70313858231833]
We propose an innovative end-to-end generative framework for multi-modal knowledge retrieval.
Our framework takes advantage of the fact that large language models (LLMs) can effectively serve as virtual knowledge bases.
We demonstrate significant improvements ranging from 3.0% to 14.6% across all evaluation metrics when compared to strong baselines.
arXiv Detail & Related papers (2024-01-16T08:44:29Z) - Decoupling Common and Unique Representations for Multimodal Self-supervised Learning [22.12729786091061]
We propose Decoupling Common and Unique Representations (DeCUR), a simple yet effective method for multimodal self-supervised learning.
By distinguishing inter- and intra-modal embeddings through multimodal redundancy reduction, DeCUR can integrate complementary information across different modalities.
arXiv Detail & Related papers (2023-09-11T08:35:23Z) - Factorized Contrastive Learning: Going Beyond Multi-view Redundancy [116.25342513407173]
This paper proposes FactorCL, a new multimodal representation learning method to go beyond multi-view redundancy.
On large-scale real-world datasets, FactorCL captures both shared and unique information and achieves state-of-the-art results.
arXiv Detail & Related papers (2023-06-08T15:17:04Z) - Multimodal Learning Without Labeled Multimodal Data: Guarantees and Applications [90.6849884683226]
We study the challenge of interaction quantification in a semi-supervised setting with only labeled unimodal data.
Using a precise information-theoretic definition of interactions, our key contribution is the derivation of lower and upper bounds.
We show how these theoretical results can be used to estimate multimodal model performance, guide data collection, and select appropriate multimodal models for various tasks.
arXiv Detail & Related papers (2023-06-07T15:44:53Z) - A Unified Continuous Learning Framework for Multi-modal Knowledge
Discovery and Pre-training [73.7507857547549]
We propose to unify knowledge discovery and multi-modal pre-training in a continuous learning framework.
For knowledge discovery, a pre-trained model is used to identify cross-modal links on a graph.
For model pre-training, the knowledge graph is used as the external knowledge to guide the model updating.
arXiv Detail & Related papers (2022-06-11T16:05:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.