Pseudo-Label Calibration Semi-supervised Multi-Modal Entity Alignment
- URL: http://arxiv.org/abs/2403.01203v1
- Date: Sat, 2 Mar 2024 12:44:59 GMT
- Title: Pseudo-Label Calibration Semi-supervised Multi-Modal Entity Alignment
- Authors: Luyao Wang and Pengnian Qi and Xigang Bao and Chunlai Zhou and Biao
Qin
- Abstract summary: Multi-modal entity alignment (MMEA) aims to identify equivalent entities between two multi-modal knowledge graphs for integration.
We introduce a Pseudo-label Multimodal Entity Alignment (PCMEA) in a semi-supervised way.
We combine momentum-based contrastive learning to make full use of the labeled and unlabeled data, which improves the quality of pseudo-label and pulls aligned entities closer.
- Score: 7.147651976133246
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-modal entity alignment (MMEA) aims to identify equivalent entities
between two multi-modal knowledge graphs for integration. Unfortunately, prior
arts have attempted to improve the interaction and fusion of multi-modal
information, which have overlooked the influence of modal-specific noise and
the usage of labeled and unlabeled data in semi-supervised settings. In this
work, we introduce a Pseudo-label Calibration Multi-modal Entity Alignment
(PCMEA) in a semi-supervised way. Specifically, in order to generate holistic
entity representations, we first devise various embedding modules and attention
mechanisms to extract visual, structural, relational, and attribute features.
Different from the prior direct fusion methods, we next propose to exploit
mutual information maximization to filter the modal-specific noise and to
augment modal-invariant commonality. Then, we combine pseudo-label calibration
with momentum-based contrastive learning to make full use of the labeled and
unlabeled data, which improves the quality of pseudo-label and pulls aligned
entities closer. Finally, extensive experiments on two MMEA datasets
demonstrate the effectiveness of our PCMEA, which yields state-of-the-art
performance.
Related papers
- NativE: Multi-modal Knowledge Graph Completion in the Wild [51.80447197290866]
We propose a comprehensive framework NativE to achieve MMKGC in the wild.
NativE proposes a relation-guided dual adaptive fusion module that enables adaptive fusion for any modalities.
We construct a new benchmark called WildKGC with five datasets to evaluate our method.
arXiv Detail & Related papers (2024-03-28T03:04:00Z) - Joint Multimodal Transformer for Emotion Recognition in the Wild [49.735299182004404]
Multimodal emotion recognition (MMER) systems typically outperform unimodal systems.
This paper proposes an MMER method that relies on a joint multimodal transformer (JMT) for fusion with key-based cross-attention.
arXiv Detail & Related papers (2024-03-15T17:23:38Z) - CARAT: Contrastive Feature Reconstruction and Aggregation for
Multi-Modal Multi-Label Emotion Recognition [18.75994345925282]
Multi-modal multi-label emotion recognition (MMER) aims to identify relevant emotions from multiple modalities.
The challenge of MMER is how to effectively capture discriminative features for multiple labels from heterogeneous data.
This paper presents ContrAstive feature Reconstruction and AggregaTion (CARAT) for the MMER task.
arXiv Detail & Related papers (2023-12-15T20:58:05Z) - Multi-Modal Knowledge Graph Transformer Framework for Multi-Modal Entity
Alignment [17.592908862768425]
We propose a novel MMEA transformer, called MoAlign, that hierarchically introduces neighbor features, multi-modal attributes, and entity types.
Taking advantage of the transformer's ability to better integrate multiple information, we design a hierarchical modifiable self-attention block in a transformer encoder.
Our approach outperforms strong competitors and achieves excellent entity alignment performance.
arXiv Detail & Related papers (2023-10-10T07:06:06Z) - Exploiting Modality-Specific Features For Multi-Modal Manipulation
Detection And Grounding [54.49214267905562]
We construct a transformer-based framework for multi-modal manipulation detection and grounding tasks.
Our framework simultaneously explores modality-specific features while preserving the capability for multi-modal alignment.
We propose an implicit manipulation query (IMQ) that adaptively aggregates global contextual cues within each modality.
arXiv Detail & Related papers (2023-09-22T06:55:41Z) - Correlation-Aware Mutual Learning for Semi-supervised Medical Image
Segmentation [5.045813144375637]
Most existing semi-supervised segmentation methods only focus on extracting information from unlabeled data.
We propose a novel Correlation Aware Mutual Learning framework that leverages labeled data to guide the extraction of information from unlabeled data.
Our approach is based on a mutual learning strategy that incorporates two modules: the Cross-sample Mutual Attention Module (CMA) and the Omni-Correlation Consistency Module (OCC)
arXiv Detail & Related papers (2023-07-12T17:20:05Z) - Multimodal Learning Without Labeled Multimodal Data: Guarantees and Applications [90.6849884683226]
We study the challenge of interaction quantification in a semi-supervised setting with only labeled unimodal data.
Using a precise information-theoretic definition of interactions, our key contribution is the derivation of lower and upper bounds.
We show how these theoretical results can be used to estimate multimodal model performance, guide data collection, and select appropriate multimodal models for various tasks.
arXiv Detail & Related papers (2023-06-07T15:44:53Z) - MEAformer: Multi-modal Entity Alignment Transformer for Meta Modality
Hybrid [40.745848169903105]
Multi-modal entity alignment (MMEA) aims to discover identical entities across different knowledge graphs.
MMEA algorithms rely on KG-level modality fusion strategies for multi-modal entity representation.
This paper introduces MEAformer, a multi-modal entity alignment transformer approach for meta modality hybrid.
arXiv Detail & Related papers (2022-12-29T20:49:58Z) - Multi-modal Contrastive Representation Learning for Entity Alignment [57.92705405276161]
Multi-modal entity alignment aims to identify equivalent entities between two different multi-modal knowledge graphs.
We propose MCLEA, a Multi-modal Contrastive Learning based Entity Alignment model.
In particular, MCLEA firstly learns multiple individual representations from multiple modalities, and then performs contrastive learning to jointly model intra-modal and inter-modal interactions.
arXiv Detail & Related papers (2022-09-02T08:59:57Z) - Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal
Sentiment Analysis [96.46952672172021]
Bi-Bimodal Fusion Network (BBFN) is a novel end-to-end network that performs fusion on pairwise modality representations.
Model takes two bimodal pairs as input due to known information imbalance among modalities.
arXiv Detail & Related papers (2021-07-28T23:33:42Z) - Self-Supervised Multimodal Domino: in Search of Biomarkers for
Alzheimer's Disease [19.86082635340699]
We propose a taxonomy of all reasonable ways to organize self-supervised representation-learning algorithms.
We first evaluate models on toy multimodal MNIST datasets and then apply them to a multimodal neuroimaging dataset with Alzheimer's disease patients.
Results show that the proposed approach outperforms previous self-supervised encoder-decoder methods.
arXiv Detail & Related papers (2020-12-25T20:28:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.