Rethinking Uncertainly Missing and Ambiguous Visual Modality in
Multi-Modal Entity Alignment
- URL: http://arxiv.org/abs/2307.16210v2
- Date: Tue, 1 Aug 2023 05:35:51 GMT
- Title: Rethinking Uncertainly Missing and Ambiguous Visual Modality in
Multi-Modal Entity Alignment
- Authors: Zhuo Chen, Lingbing Guo, Yin Fang, Yichi Zhang, Jiaoyan Chen, Jeff Z.
Pan, Yangning Li, Huajun Chen, Wen Zhang
- Abstract summary: We present a further analysis of visual modality incompleteness, benchmarking latest MMEA models on our proposed dataset MMEA-UMVM.
Our research indicates that, in the face of modality incompleteness, models succumb to overfitting the modality noise, and exhibit performance oscillations or declines at high rates of missing modality.
We introduce UMAEA, a robust multi-modal entity alignment approach designed to tackle uncertainly missing and ambiguous visual modalities.
- Score: 38.574204922793626
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As a crucial extension of entity alignment (EA), multi-modal entity alignment
(MMEA) aims to identify identical entities across disparate knowledge graphs
(KGs) by exploiting associated visual information. However, existing MMEA
approaches primarily concentrate on the fusion paradigm of multi-modal entity
features, while neglecting the challenges presented by the pervasive phenomenon
of missing and intrinsic ambiguity of visual images. In this paper, we present
a further analysis of visual modality incompleteness, benchmarking latest MMEA
models on our proposed dataset MMEA-UMVM, where the types of alignment KGs
covering bilingual and monolingual, with standard (non-iterative) and iterative
training paradigms to evaluate the model performance. Our research indicates
that, in the face of modality incompleteness, models succumb to overfitting the
modality noise, and exhibit performance oscillations or declines at high rates
of missing modality. This proves that the inclusion of additional multi-modal
data can sometimes adversely affect EA. To address these challenges, we
introduce UMAEA , a robust multi-modal entity alignment approach designed to
tackle uncertainly missing and ambiguous visual modalities. It consistently
achieves SOTA performance across all 97 benchmark splits, significantly
surpassing existing baselines with limited parameters and time consumption,
while effectively alleviating the identified limitations of other models. Our
code and benchmark data are available at https://github.com/zjukg/UMAEA.
Related papers
- MITA: Bridging the Gap between Model and Data for Test-time Adaptation [68.62509948690698]
Test-Time Adaptation (TTA) has emerged as a promising paradigm for enhancing the generalizability of models.
We propose Meet-In-The-Middle based MITA, which introduces energy-based optimization to encourage mutual adaptation of the model and data from opposing directions.
arXiv Detail & Related papers (2024-10-12T07:02:33Z) - Towards Robust Multimodal Sentiment Analysis with Incomplete Data [20.75292807497547]
We present an innovative Language-dominated Noise-resistant Learning Network (LNLN) to achieve robust Multimodal Sentiment Analysis (MSA)
LNLN features a dominant modality correction (DMC) module and dominant modality based multimodal learning (DMML) module, which enhances the model's robustness across various noise scenarios.
arXiv Detail & Related papers (2024-09-30T07:14:31Z) - IBMEA: Exploring Variational Information Bottleneck for Multi-modal Entity Alignment [17.570243718626994]
Multi-modal entity alignment (MMEA) aims to identify equivalent entities between multi-modal knowledge graphs (MMKGs)
We devise multi-modal variational encoders to generate modal-specific entity representations as probability distributions.
We also propose four modal-specific information bottleneck regularizers, limiting the misleading clues in refining modal-specific entity representations.
arXiv Detail & Related papers (2024-07-27T17:12:37Z) - Progressively Modality Freezing for Multi-Modal Entity Alignment [27.77877721548588]
We propose a novel strategy of progressive modality freezing, called PMF, that focuses on alignmentrelevant features.
Notably, our approach introduces a pioneering cross-modal association loss to foster modal consistency.
Empirical evaluations across nine datasets confirm PMF's superiority.
arXiv Detail & Related papers (2024-07-23T04:22:30Z) - A Novel Energy based Model Mechanism for Multi-modal Aspect-Based
Sentiment Analysis [85.77557381023617]
We propose a novel framework called DQPSA for multi-modal sentiment analysis.
PDQ module uses the prompt as both a visual query and a language query to extract prompt-aware visual information.
EPE module models the boundaries pairing of the analysis target from the perspective of an Energy-based Model.
arXiv Detail & Related papers (2023-12-13T12:00:46Z) - Cross-Attention is Not Enough: Incongruity-Aware Dynamic Hierarchical
Fusion for Multimodal Affect Recognition [69.32305810128994]
Incongruity between modalities poses a challenge for multimodal fusion, especially in affect recognition.
We propose the Hierarchical Crossmodal Transformer with Dynamic Modality Gating (HCT-DMG), a lightweight incongruity-aware model.
HCT-DMG: 1) outperforms previous multimodal models with a reduced size of approximately 0.8M parameters; 2) recognizes hard samples where incongruity makes affect recognition difficult; 3) mitigates the incongruity at the latent level in crossmodal attention.
arXiv Detail & Related papers (2023-05-23T01:24:15Z) - MEAformer: Multi-modal Entity Alignment Transformer for Meta Modality
Hybrid [40.745848169903105]
Multi-modal entity alignment (MMEA) aims to discover identical entities across different knowledge graphs.
MMEA algorithms rely on KG-level modality fusion strategies for multi-modal entity representation.
This paper introduces MEAformer, a multi-modal entity alignment transformer approach for meta modality hybrid.
arXiv Detail & Related papers (2022-12-29T20:49:58Z) - Exploiting modality-invariant feature for robust multimodal emotion
recognition with missing modalities [76.08541852988536]
We propose to use invariant features for a missing modality imagination network (IF-MMIN)
We show that the proposed model outperforms all baselines and invariantly improves the overall emotion recognition performance under uncertain missing-modality conditions.
arXiv Detail & Related papers (2022-10-27T12:16:25Z) - Self-attention fusion for audiovisual emotion recognition with
incomplete data [103.70855797025689]
We consider the problem of multimodal data analysis with a use case of audiovisual emotion recognition.
We propose an architecture capable of learning from raw data and describe three variants of it with distinct modality fusion mechanisms.
arXiv Detail & Related papers (2022-01-26T18:04:29Z) - On the Limitations of Multimodal VAEs [9.449650062296824]
Multimodal variational autoencoders (VAEs) have shown promise as efficient generative models for weakly-supervised data.
Despite their advantage of weak supervision, they exhibit a gap in generative quality compared to unimodal VAEs.
arXiv Detail & Related papers (2021-10-08T13:28:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.