Learning with Dual-level Noisy Correspondence for Multi-modal Entity Alignment
- URL: http://arxiv.org/abs/2510.18240v1
- Date: Tue, 21 Oct 2025 03:00:11 GMT
- Title: Learning with Dual-level Noisy Correspondence for Multi-modal Entity Alignment
- Authors: Haobin Li, Yijie Lin, Peng Hu, Mouxing Yang, Xi Peng,
- Abstract summary: Multi-modal entity alignment (MMEA) aims to identify equivalent entities across heterogeneous multi-modal knowledge graphs (MMKGs)<n>Existing methods assume that both intra-entity and inter-graph correspondences are faultless, which is often violated in real-world MMKGs.<n>We propose a robust MMEA framework termed RULE to address the Dual-level Noisy Correspondence (DNC) problem.
- Score: 32.22464813030617
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-modal entity alignment (MMEA) aims to identify equivalent entities across heterogeneous multi-modal knowledge graphs (MMKGs), where each entity is described by attributes from various modalities. Existing methods typically assume that both intra-entity and inter-graph correspondences are faultless, which is often violated in real-world MMKGs due to the reliance on expert annotations. In this paper, we reveal and study a highly practical yet under-explored problem in MMEA, termed Dual-level Noisy Correspondence (DNC). DNC refers to misalignments in both intra-entity (entity-attribute) and inter-graph (entity-entity and attribute-attribute) correspondences. To address the DNC problem, we propose a robust MMEA framework termed RULE. RULE first estimates the reliability of both intra-entity and inter-graph correspondences via a dedicated two-fold principle. Leveraging the estimated reliabilities, RULE mitigates the negative impact of intra-entity noise during attribute fusion and prevents overfitting to noisy inter-graph correspondences during inter-graph discrepancy elimination. Beyond the training-time designs, RULE further incorporates a correspondence reasoning module that uncovers the underlying attribute-attribute connection across graphs, guaranteeing more accurate equivalent entity identification. Extensive experiments on five benchmarks verify the effectiveness of our method against the DNC compared with seven state-of-the-art methods.The code is available at \href{https://github.com/XLearning-SCU/RULE}{XLearning-SCU/RULE}
Related papers
- CKDA: Cross-modality Knowledge Disentanglement and Alignment for Visible-Infrared Lifelong Person Re-identification [77.07028925223383]
Lifelong person Re-IDentification aims to match the same person employing continuously collected individual data from different scenarios.<n>To achieve continuous all-day person matching across day and night, Visible-Infrared Lifelong person Re-IDentification (VI-LReID) focuses on sequential training on data from visible and infrared modalities.<n>Existing methods typically exploit cross-modal knowledge distillation to alleviate the catastrophic forgetting of old knowledge.
arXiv Detail & Related papers (2025-11-19T01:30:29Z) - Cross-modal Full-mode Fine-grained Alignment for Text-to-Image Person Retrieval [54.90229711181207]
Text-to-Image Person Retrieval (TIPR) aims to retrieve the most relevant person images based on a given text query.<n>The key challenge in TIPR lies in achieving effective alignment between textual and visual modalities.<n>We propose FMFA, a cross-modal Full-Mode Fine-grained Alignment framework.
arXiv Detail & Related papers (2025-09-17T07:12:05Z) - Robust Brain Tumor Segmentation with Incomplete MRI Modalities Using Hölder Divergence and Mutual Information-Enhanced Knowledge Transfer [10.66488607852885]
We propose a robust single-modality parallel processing framework that achieves high segmentation accuracy even with incomplete modalities.<n>Our model maintains modality-specific features while dynamically adjusting network parameters based on the available inputs.<n>By using these divergence- and information-based loss functions, the framework effectively quantifies discrepancies between predictions and ground-truth labels.
arXiv Detail & Related papers (2025-07-02T00:18:07Z) - Localizing Factual Inconsistencies in Attributable Text Generation [74.11403803488643]
We introduce QASemConsistency, a new formalism for localizing factual inconsistencies in attributable text generation.<n>We show that QASemConsistency yields factual consistency scores that correlate well with human judgments.
arXiv Detail & Related papers (2024-10-09T22:53:48Z) - Disentangled Noisy Correspondence Learning [56.06801962154915]
Cross-modal retrieval is crucial in understanding latent correspondences across modalities.
DisNCL is a novel information-theoretic framework for feature Disentanglement in Noisy Correspondence Learning.
arXiv Detail & Related papers (2024-08-10T09:49:55Z) - Unsupervised Robust Cross-Lingual Entity Alignment via Neighbor Triple Matching with Entity and Relation Texts [17.477542644785483]
Cross-lingual entity alignment (EA) enables the integration of multiple knowledge graphs (KGs) across different languages.<n>EA pipeline that jointly performs entity-level and Relation-level Alignment by neighbor triple matching strategy.
arXiv Detail & Related papers (2024-07-22T12:25:48Z) - REPAIR: Rank Correlation and Noisy Pair Half-replacing with Memory for
Noisy Correspondence [36.274879585424635]
The presence of noise in acquired data invariably leads to performance degradation in crossmodal matching.
We propose a framework as Rank corrElation and noisy hAlf wIth memoRy to tackle the mismatched data pair issue.
arXiv Detail & Related papers (2024-03-13T04:01:20Z) - Multi-Grained Multimodal Interaction Network for Entity Linking [65.30260033700338]
Multimodal entity linking task aims at resolving ambiguous mentions to a multimodal knowledge graph.
We propose a novel Multi-GraIned Multimodal InteraCtion Network $textbf(MIMIC)$ framework for solving the MEL task.
arXiv Detail & Related papers (2023-07-19T02:11:19Z) - Object Segmentation by Mining Cross-Modal Semantics [68.88086621181628]
We propose a novel approach by mining the Cross-Modal Semantics to guide the fusion and decoding of multimodal features.
Specifically, we propose a novel network, termed XMSNet, consisting of (1) all-round attentive fusion (AF), (2) coarse-to-fine decoder (CFD), and (3) cross-layer self-supervision.
arXiv Detail & Related papers (2023-05-17T14:30:11Z) - VERITE: A Robust Benchmark for Multimodal Misinformation Detection
Accounting for Unimodal Bias [17.107961913114778]
multimodal misinformation is a growing problem on social media platforms.
In this study, we investigate and identify the presence of unimodal bias in widely-used MMD benchmarks.
We introduce a new method -- termed Crossmodal HArd Synthetic MisAlignment (CHASMA) -- for generating realistic synthetic training data.
arXiv Detail & Related papers (2023-04-27T12:28:29Z) - Attribute-Consistent Knowledge Graph Representation Learning for
Multi-Modal Entity Alignment [14.658282035561792]
We propose a novel attribute-consistent knowledge graph representation learning framework for MMEA (ACK-MMEA)
Our approach achieves excellent performance compared to its competitors.
arXiv Detail & Related papers (2023-04-04T06:39:36Z) - Graph Matching with Bi-level Noisy Correspondence [43.071988798418886]
Bi-level Noisy Correspondence (BNC) refers to node-level noisy correspondence (NNC) and edge-level noisy correspondence (ENC)
arXiv Detail & Related papers (2022-12-08T05:42:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.