Related papers: Learning with Dual-level Noisy Correspondence for Multi-modal Entity Alignment

Learning with Dual-level Noisy Correspondence for Multi-modal Entity Alignment

URL: http://arxiv.org/abs/2510.18240v1
Date: Tue, 21 Oct 2025 03:00:11 GMT
Title: Learning with Dual-level Noisy Correspondence for Multi-modal Entity Alignment
Authors: Haobin Li, Yijie Lin, Peng Hu, Mouxing Yang, Xi Peng,
Abstract summary: Multi-modal entity alignment (MMEA) aims to identify equivalent entities across heterogeneous multi-modal knowledge graphs (MMKGs)<n>Existing methods assume that both intra-entity and inter-graph correspondences are faultless, which is often violated in real-world MMKGs.<n>We propose a robust MMEA framework termed RULE to address the Dual-level Noisy Correspondence (DNC) problem.
Score: 32.22464813030617
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multi-modal entity alignment (MMEA) aims to identify equivalent entities across heterogeneous multi-modal knowledge graphs (MMKGs), where each entity is described by attributes from various modalities. Existing methods typically assume that both intra-entity and inter-graph correspondences are faultless, which is often violated in real-world MMKGs due to the reliance on expert annotations. In this paper, we reveal and study a highly practical yet under-explored problem in MMEA, termed Dual-level Noisy Correspondence (DNC). DNC refers to misalignments in both intra-entity (entity-attribute) and inter-graph (entity-entity and attribute-attribute) correspondences. To address the DNC problem, we propose a robust MMEA framework termed RULE. RULE first estimates the reliability of both intra-entity and inter-graph correspondences via a dedicated two-fold principle. Leveraging the estimated reliabilities, RULE mitigates the negative impact of intra-entity noise during attribute fusion and prevents overfitting to noisy inter-graph correspondences during inter-graph discrepancy elimination. Beyond the training-time designs, RULE further incorporates a correspondence reasoning module that uncovers the underlying attribute-attribute connection across graphs, guaranteeing more accurate equivalent entity identification. Extensive experiments on five benchmarks verify the effectiveness of our method against the DNC compared with seven state-of-the-art methods.The code is available at \href{https://github.com/XLearning-SCU/RULE}{XLearning-SCU/RULE}

Related papers

CKDA: Cross-modality Knowledge Disentanglement and Alignment for Visible-Infrared Lifelong Person Re-identification [77.07028925223383]
Lifelong person Re-IDentification aims to match the same person employing continuously collected individual data from different scenarios.<n>To achieve continuous all-day person matching across day and night, Visible-Infrared Lifelong person Re-IDentification (VI-LReID) focuses on sequential training on data from visible and infrared modalities.<n>Existing methods typically exploit cross-modal knowledge distillation to alleviate the catastrophic forgetting of old knowledge.
arXiv Detail & Related papers (2025-11-19T01:30:29Z)
Cross-modal Full-mode Fine-grained Alignment for Text-to-Image Person Retrieval [54.90229711181207]
Text-to-Image Person Retrieval (TIPR) aims to retrieve the most relevant person images based on a given text query.<n>The key challenge in TIPR lies in achieving effective alignment between textual and visual modalities.<n>We propose FMFA, a cross-modal Full-Mode Fine-grained Alignment framework.
arXiv Detail & Related papers (2025-09-17T07:12:05Z)
Robust Brain Tumor Segmentation with Incomplete MRI Modalities Using Hölder Divergence and Mutual Information-Enhanced Knowledge Transfer [10.66488607852885]
We propose a robust single-modality parallel processing framework that achieves high segmentation accuracy even with incomplete modalities.<n>Our model maintains modality-specific features while dynamically adjusting network parameters based on the available inputs.<n>By using these divergence- and information-based loss functions, the framework effectively quantifies discrepancies between predictions and ground-truth labels.
arXiv Detail & Related papers (2025-07-02T00:18:07Z)
Localizing Factual Inconsistencies in Attributable Text Generation [74.11403803488643]
We introduce QASemConsistency, a new formalism for localizing factual inconsistencies in attributable text generation.<n>We show that QASemConsistency yields factual consistency scores that correlate well with human judgments.
arXiv Detail & Related papers (2024-10-09T22:53:48Z)
Disentangled Noisy Correspondence Learning [56.06801962154915]
Cross-modal retrieval is crucial in understanding latent correspondences across modalities. DisNCL is a novel information-theoretic framework for feature Disentanglement in Noisy Correspondence Learning.
arXiv Detail & Related papers (2024-08-10T09:49:55Z)
Unsupervised Robust Cross-Lingual Entity Alignment via Neighbor Triple Matching with Entity and Relation Texts [17.477542644785483]
Cross-lingual entity alignment (EA) enables the integration of multiple knowledge graphs (KGs) across different languages.<n>EA pipeline that jointly performs entity-level and Relation-level Alignment by neighbor triple matching strategy.
arXiv Detail & Related papers (2024-07-22T12:25:48Z)
REPAIR: Rank Correlation and Noisy Pair Half-replacing with Memory for Noisy Correspondence [36.274879585424635]
The presence of noise in acquired data invariably leads to performance degradation in crossmodal matching. We propose a framework as Rank corrElation and noisy hAlf wIth memoRy to tackle the mismatched data pair issue.
arXiv Detail & Related papers (2024-03-13T04:01:20Z)
Multi-Grained Multimodal Interaction Network for Entity Linking [65.30260033700338]
Multimodal entity linking task aims at resolving ambiguous mentions to a multimodal knowledge graph. We propose a novel Multi-GraIned Multimodal InteraCtion Network $textbf(MIMIC)$ framework for solving the MEL task.
arXiv Detail & Related papers (2023-07-19T02:11:19Z)
Object Segmentation by Mining Cross-Modal Semantics [68.88086621181628]
We propose a novel approach by mining the Cross-Modal Semantics to guide the fusion and decoding of multimodal features. Specifically, we propose a novel network, termed XMSNet, consisting of (1) all-round attentive fusion (AF), (2) coarse-to-fine decoder (CFD), and (3) cross-layer self-supervision.
arXiv Detail & Related papers (2023-05-17T14:30:11Z)
VERITE: A Robust Benchmark for Multimodal Misinformation Detection Accounting for Unimodal Bias [17.107961913114778]
multimodal misinformation is a growing problem on social media platforms. In this study, we investigate and identify the presence of unimodal bias in widely-used MMD benchmarks. We introduce a new method -- termed Crossmodal HArd Synthetic MisAlignment (CHASMA) -- for generating realistic synthetic training data.
arXiv Detail & Related papers (2023-04-27T12:28:29Z)
Attribute-Consistent Knowledge Graph Representation Learning for Multi-Modal Entity Alignment [14.658282035561792]
We propose a novel attribute-consistent knowledge graph representation learning framework for MMEA (ACK-MMEA) Our approach achieves excellent performance compared to its competitors.
arXiv Detail & Related papers (2023-04-04T06:39:36Z)
Graph Matching with Bi-level Noisy Correspondence [43.071988798418886]
Bi-level Noisy Correspondence (BNC) refers to node-level noisy correspondence (NNC) and edge-level noisy correspondence (ENC)
arXiv Detail & Related papers (2022-12-08T05:42:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.