Multi-modal Contrastive Representation Learning for Entity Alignment
- URL: http://arxiv.org/abs/2209.00891v1
- Date: Fri, 2 Sep 2022 08:59:57 GMT
- Title: Multi-modal Contrastive Representation Learning for Entity Alignment
- Authors: Zhenxi Lin, Ziheng Zhang, Meng Wang, Yinghui Shi, Xian Wu, Yefeng
Zheng
- Abstract summary: Multi-modal entity alignment aims to identify equivalent entities between two different multi-modal knowledge graphs.
We propose MCLEA, a Multi-modal Contrastive Learning based Entity Alignment model.
In particular, MCLEA firstly learns multiple individual representations from multiple modalities, and then performs contrastive learning to jointly model intra-modal and inter-modal interactions.
- Score: 57.92705405276161
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multi-modal entity alignment aims to identify equivalent entities between two
different multi-modal knowledge graphs, which consist of structural triples and
images associated with entities. Most previous works focus on how to utilize
and encode information from different modalities, while it is not trivial to
leverage multi-modal knowledge in entity alignment because of the modality
heterogeneity. In this paper, we propose MCLEA, a Multi-modal Contrastive
Learning based Entity Alignment model, to obtain effective joint
representations for multi-modal entity alignment. Different from previous
works, MCLEA considers task-oriented modality and models the inter-modal
relationships for each entity representation. In particular, MCLEA firstly
learns multiple individual representations from multiple modalities, and then
performs contrastive learning to jointly model intra-modal and inter-modal
interactions. Extensive experimental results show that MCLEA outperforms
state-of-the-art baselines on public datasets under both supervised and
unsupervised settings.
Related papers
- Leveraging Entity Information for Cross-Modality Correlation Learning: The Entity-Guided Multimodal Summarization [49.08348604716746]
Multimodal Summarization with Multimodal Output (MSMO) aims to produce a multimodal summary that integrates both text and relevant images.
In this paper, we propose an Entity-Guided Multimodal Summarization model (EGMS)
Our model, building on BART, utilizes dual multimodal encoders with shared weights to process text-image and entity-image information concurrently.
arXiv Detail & Related papers (2024-08-06T12:45:56Z) - A Framework for Multi-modal Learning: Jointly Modeling Inter- & Intra-Modality Dependencies [42.16496299814368]
We argue that conventional approaches that rely solely on either inter- or intra-modality dependencies may not be optimal in general.
We propose inter-language & intra-modality modeling (I2M2) framework, which captures and integrates both the inter- and intra-modality dependencies.
We evaluate our approach using real-world healthcare and vision-and-the-art datasets with state-of-the-art models.
arXiv Detail & Related papers (2024-05-27T19:22:41Z) - Multiple Heads are Better than One: Mixture of Modality Knowledge Experts for Entity Representation Learning [51.80447197290866]
Learning high-quality multi-modal entity representations is an important goal of multi-modal knowledge graph (MMKG) representation learning.
Existing methods focus on crafting elegant entity-wise multi-modal fusion strategies.
We introduce a novel framework with Mixture of Modality Knowledge experts (MoMoK) to learn adaptive multi-modal entity representations.
arXiv Detail & Related papers (2024-05-27T06:36:17Z) - Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts [54.529880848937104]
We develop a unified MLLM with the MoE architecture, named Uni-MoE, that can handle a wide array of modalities.
Specifically, it features modality-specific encoders with connectors for a unified multimodal representation.
We evaluate the instruction-tuned Uni-MoE on a comprehensive set of multimodal datasets.
arXiv Detail & Related papers (2024-05-18T12:16:01Z) - Leveraging Intra-modal and Inter-modal Interaction for Multi-Modal Entity Alignment [27.28214706269035]
Multi-modal entity alignment (MMEA) aims to identify equivalent entity pairs across different multi-modal knowledge graphs (MMKGs)
In this paper, we propose a Multi-Grained Interaction framework for Multi-Modal Entity alignment.
arXiv Detail & Related papers (2024-04-19T08:43:11Z) - NativE: Multi-modal Knowledge Graph Completion in the Wild [51.80447197290866]
We propose a comprehensive framework NativE to achieve MMKGC in the wild.
NativE proposes a relation-guided dual adaptive fusion module that enables adaptive fusion for any modalities.
We construct a new benchmark called WildKGC with five datasets to evaluate our method.
arXiv Detail & Related papers (2024-03-28T03:04:00Z) - Unified Multi-modal Unsupervised Representation Learning for
Skeleton-based Action Understanding [62.70450216120704]
Unsupervised pre-training has shown great success in skeleton-based action understanding.
We propose a Unified Multimodal Unsupervised Representation Learning framework, called UmURL.
UmURL exploits an efficient early-fusion strategy to jointly encode the multi-modal features in a single-stream manner.
arXiv Detail & Related papers (2023-11-06T13:56:57Z) - MESED: A Multi-modal Entity Set Expansion Dataset with Fine-grained
Semantic Classes and Hard Negative Entities [25.059177235004952]
We propose Multi-modal Entity Set Expansion (MESE), where models integrate information from multiple modalities to represent entities.
A powerful multi-modal model MultiExpan is proposed which is pre-trained on four multimodal pre-training tasks.
The MESED dataset is the first multi-modal dataset for ESE with large-scale and elaborate manual calibration.
arXiv Detail & Related papers (2023-07-27T14:09:59Z) - Learning Modality-Specific Representations with Self-Supervised
Multi-Task Learning for Multimodal Sentiment Analysis [11.368438990334397]
We develop a self-supervised learning strategy to acquire independent unimodal supervisions.
We conduct extensive experiments on three public multimodal baseline datasets.
Our method achieves comparable performance than human-annotated unimodal labels.
arXiv Detail & Related papers (2021-02-09T14:05:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.