Related papers: Combating Confirmation Bias: A Unified Pseudo-Labeling Framework for Entity Alignment

Combating Confirmation Bias: A Unified Pseudo-Labeling Framework for Entity Alignment

URL: http://arxiv.org/abs/2307.02075v4
Date: Wed, 02 Jul 2025 01:04:31 GMT
Title: Combating Confirmation Bias: A Unified Pseudo-Labeling Framework for Entity Alignment
Authors: Qijie Ding, Jie Yin, Daokun Zhang, Junbin Gao,
Abstract summary: We propose a Unified Pseudo-Labeling framework for Entity Alignment (UPL-EA)<n>UPL-EA explicitly eliminates pseudo-labeling errors to boost the accuracy of entity alignment.<n>Our results and in-depth analyses demonstrate the superiority of UPL-EA over 15 competitive baselines.
Score: 30.407534668054286
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Entity alignment (EA) aims at identifying equivalent entity pairs across different knowledge graphs (KGs) that refer to the same real-world identity. To circumvent the shortage of seed alignments provided for training, recent EA models utilize pseudo-labeling strategies to iteratively add unaligned entity pairs predicted with high confidence to the seed alignments for model training. However, the adverse impact of confirmation bias during pseudo-labeling has been largely overlooked, thus hindering entity alignment performance. To systematically combat confirmation bias for pseudo-labeling-based entity alignment, we propose a Unified Pseudo-Labeling framework for Entity Alignment (UPL-EA) that explicitly eliminates pseudo-labeling errors to boost the accuracy of entity alignment. UPL-EA consists of two complementary components: (1) Optimal Transport (OT)-based pseudo-labeling uses discrete OT modeling as an effective means to determine entity correspondences and reduce erroneous matches across two KGs. An effective criterion is derived to infer pseudo-labeled alignments that satisfy one-to-one correspondences; (2) Parallel pseudo-label ensembling refines pseudo-labeled alignments by combining predictions over multiple models independently trained in parallel. The ensembled pseudo-labeled alignments are thereafter used to augment seed alignments to reinforce subsequent model training for alignment inference. The effectiveness of UPL-EA in eliminating pseudo-labeling errors is both theoretically supported and experimentally validated. Our extensive results and in-depth analyses demonstrate the superiority of UPL-EA over 15 competitive baselines and its utility as a general pseudo-labeling framework for entity alignment.

Related papers

Prototype-Guided Pseudo-Labeling with Neighborhood-Aware Consistency for Unsupervised Adaptation [12.829638461740759]
In unsupervised adaptation for vision-language models such as CLIP, pseudo-labels from zero-shot predictions often exhibit significant noise.<n>We propose a novel adaptive pseudo-labeling framework that enhances CLIP's adaptation performance by integrating prototype consistency and neighborhood-based consistency.<n>Our method achieves state-of-the-art performance in unsupervised adaptation scenarios, delivering more accurate pseudo-labels while maintaining computational efficiency.
arXiv Detail & Related papers (2025-07-22T19:08:24Z)
Drawing the Same Bounding Box Twice? Coping Noisy Annotations in Object Detection with Repeated Labels [6.872072177648135]
We propose a novel localization algorithm that adapts well-established ground truth estimation methods. Our algorithm also shows superior performance during training on the TexBiG dataset.
arXiv Detail & Related papers (2023-09-18T13:08:44Z)
Alternative Pseudo-Labeling for Semi-Supervised Automatic Speech Recognition [49.42732949233184]
When labeled data is insufficient, semi-supervised learning with the pseudo-labeling technique can significantly improve the performance of automatic speech recognition. Taking noisy labels as ground-truth in the loss function results in suboptimal performance. We propose a novel framework named alternative pseudo-labeling to tackle the issue of noisy pseudo-labels.
arXiv Detail & Related papers (2023-08-12T12:13:52Z)
Class-Distribution-Aware Pseudo Labeling for Semi-Supervised Multi-Label Learning [97.88458953075205]
Pseudo-labeling has emerged as a popular and effective approach for utilizing unlabeled data. This paper proposes a novel solution called Class-Aware Pseudo-Labeling (CAP) that performs pseudo-labeling in a class-aware manner.
arXiv Detail & Related papers (2023-05-04T12:52:18Z)
Ambiguity-Resistant Semi-Supervised Learning for Dense Object Detection [98.66771688028426]
We propose a Ambiguity-Resistant Semi-supervised Learning (ARSL) for one-stage detectors. Joint-Confidence Estimation (JCE) is proposed to quantifies the classification and localization quality of pseudo labels. ARSL effectively mitigates the ambiguities and achieves state-of-the-art SSOD performance on MS COCO and PASCAL VOC.
arXiv Detail & Related papers (2023-03-27T07:46:58Z)
Exploiting Completeness and Uncertainty of Pseudo Labels for Weakly Supervised Video Anomaly Detection [149.23913018423022]
Weakly supervised video anomaly detection aims to identify abnormal events in videos using only video-level labels. Two-stage self-training methods have achieved significant improvements by self-generating pseudo labels. We propose an enhancement framework by exploiting completeness and uncertainty properties for effective self-training.
arXiv Detail & Related papers (2022-12-08T05:53:53Z)
Neighbour Consistency Guided Pseudo-Label Refinement for Unsupervised Person Re-Identification [80.98291772215154]
Unsupervised person re-identification (ReID) aims at learning discriminative identity features for person retrieval without any annotations. Recent advances accomplish this task by leveraging clustering-based pseudo labels. We propose a Neighbour Consistency guided Pseudo Label Refinement framework.
arXiv Detail & Related papers (2022-11-30T09:39:57Z)
Conflict-Aware Pseudo Labeling via Optimal Transport for Entity Alignment [6.39671030369729]
We propose a novel Conflict-aware Pseudo Labeling via Optimal Transport model (CPL-OT) for entity alignment.<n>CPL-OT is composed of two key components -- entity embedding learning with global-local aggregation and iterative conflict-aware pseudo labeling.<n>Experiments on benchmark datasets validate the superiority of CPL-OT over state-of-the-art baselines.
arXiv Detail & Related papers (2022-09-05T09:14:01Z)
CLS: Cross Labeling Supervision for Semi-Supervised Learning [9.929229055862491]
Cross Labeling Supervision ( CLS) is a framework that generalizes the typical pseudo-labeling process. CLS allows the creation of both pseudo and complementary labels to support both positive and negative learning.
arXiv Detail & Related papers (2022-02-17T08:09:40Z)
Multi-Label Gold Asymmetric Loss Correction with Single-Label Regulators [6.129273021888717]
We propose a novel Gold Asymmetric Loss Correction with Single-Label Regulators (GALC-SLR) that operates robust against noisy labels. GALC-SLR estimates the noise confusion matrix using single-label samples, then constructs an asymmetric loss correction via estimated confusion matrix to avoid overfitting to the noisy labels. Empirical results show that our method outperforms the state-of-the-art original asymmetric loss multi-label classifier under all corruption levels.
arXiv Detail & Related papers (2021-08-04T12:57:29Z)
Distribution-Aware Semantics-Oriented Pseudo-label for Imbalanced Semi-Supervised Learning [80.05441565830726]
This paper addresses imbalanced semi-supervised learning, where heavily biased pseudo-labels can harm the model performance. We propose a general pseudo-labeling framework to address the bias motivated by this observation. We term the novel pseudo-labeling framework for imbalanced SSL as Distribution-Aware Semantics-Oriented (DASO) Pseudo-label.
arXiv Detail & Related papers (2021-06-10T11:58:25Z)
Group-aware Label Transfer for Domain Adaptive Person Re-identification [179.816105255584]
Unsupervised Adaptive Domain (UDA) person re-identification (ReID) aims at adapting the model trained on a labeled source-domain dataset to a target-domain dataset without any further annotations. Most successful UDA-ReID approaches combine clustering-based pseudo-label prediction with representation learning and perform the two steps in an alternating fashion. We propose a Group-aware Label Transfer (GLT) algorithm, which enables the online interaction and mutual promotion of pseudo-label prediction and representation learning.
arXiv Detail & Related papers (2021-03-23T07:57:39Z)
Cycle Self-Training for Domain Adaptation [85.14659717421533]
Cycle Self-Training (CST) is a principled self-training algorithm that enforces pseudo-labels to generalize across domains. CST recovers target ground truth, while both invariant feature learning and vanilla self-training fail. Empirical results indicate that CST significantly improves over prior state-of-the-arts in standard UDA benchmarks.
arXiv Detail & Related papers (2021-03-05T10:04:25Z)
Exploiting Sample Uncertainty for Domain Adaptive Person Re-Identification [137.9939571408506]
We estimate and exploit the credibility of the assigned pseudo-label of each sample to alleviate the influence of noisy labels. Our uncertainty-guided optimization brings significant improvement and achieves the state-of-the-art performance on benchmark datasets.
arXiv Detail & Related papers (2020-12-16T04:09:04Z)
MatchGAN: A Self-Supervised Semi-Supervised Conditional Generative Adversarial Network [51.84251358009803]
We present a novel self-supervised learning approach for conditional generative adversarial networks (GANs) under a semi-supervised setting. We perform augmentation by randomly sampling sensible labels from the label space of the few labelled examples available. Our method surpasses the baseline with only 20% of the labelled examples used to train the baseline.
arXiv Detail & Related papers (2020-06-11T17:14:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.