Towards Universal Dense Blocking for Entity Resolution
- URL: http://arxiv.org/abs/2404.14831v2
- Date: Thu, 25 Apr 2024 06:37:51 GMT
- Title: Towards Universal Dense Blocking for Entity Resolution
- Authors: Tianshu Wang, Hongyu Lin, Xianpei Han, Xiaoyang Chen, Boxi Cao, Le Sun,
- Abstract summary: We propose UniBlocker, a dense blocker that is pre-trained on a domain-independent, easily-obtainable corpus.
By conducting domain-independent pre-training, UniBlocker can be adapted to various downstream blocking scenarios without requiring domain-specific fine-tuning.
Our experiments show that the proposed UniBlocker, without any domain-specific learning, significantly outperforms previous self- and unsupervised dense blocking methods.
- Score: 49.06313308481536
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Blocking is a critical step in entity resolution, and the emergence of neural network-based representation models has led to the development of dense blocking as a promising approach for exploring deep semantics in blocking. However, previous advanced self-supervised dense blocking approaches require domain-specific training on the target domain, which limits the benefits and rapid adaptation of these methods. To address this issue, we propose UniBlocker, a dense blocker that is pre-trained on a domain-independent, easily-obtainable tabular corpus using self-supervised contrastive learning. By conducting domain-independent pre-training, UniBlocker can be adapted to various downstream blocking scenarios without requiring domain-specific fine-tuning. To evaluate the universality of our entity blocker, we also construct a new benchmark covering a wide range of blocking tasks from multiple domains and scenarios. Our experiments show that the proposed UniBlocker, without any domain-specific learning, significantly outperforms previous self- and unsupervised dense blocking methods and is comparable and complementary to the state-of-the-art sparse blocking methods.
Related papers
- Improved Block Merging for 3D Point Cloud Instance Segmentation [6.632158868486343]
The proposed work improves over the state-of-the-art by allowing wrongly labelled points of already processed blocks to be corrected through label propagation.
Our experiments show that the proposed block merging algorithm significantly and consistently improves the obtained accuracy for all evaluation metrics employed in literature.
arXiv Detail & Related papers (2024-07-09T16:06:34Z) - Block Sparse Bayesian Learning: A Diversified Scheme [16.61484758008309]
We introduce a novel prior called Diversified Block Sparse Prior to characterize the widespread block sparsity phenomenon in real-world data.
By allowing diversification on intra-block variance and inter-block correlation matrices, we effectively address the sensitivity issue of existing block sparse learning methods to pre-defined block information.
arXiv Detail & Related papers (2024-02-07T08:18:06Z) - ShallowBlocker: Improving Set Similarity Joins for Blocking [1.8492669447784602]
We propose a hands-off blocking method based on classical string similarity measures: ShallowBlocker.
It uses a novel hybrid set similarity join combining absolute similarity, relative similarity, and local cardinality conditions with a new effective pre-candidate filter replacing size filter.
We show that the method achieves state-of-the-art pair effectiveness on both unsupervised and supervised blocking in a scalable way.
arXiv Detail & Related papers (2023-12-26T00:31:43Z) - Model Barrier: A Compact Un-Transferable Isolation Domain for Model
Intellectual Property Protection [52.08301776698373]
We propose a novel approach called Compact Un-Transferable Isolation Domain (CUTI-domain)
CUTI-domain acts as a barrier to block illegal transfers from authorized to unauthorized domains.
We show that CUTI-domain can be easily implemented as a plug-and-play module with different backbones.
arXiv Detail & Related papers (2023-03-20T13:07:11Z) - Decompose to Adapt: Cross-domain Object Detection via Feature
Disentanglement [79.2994130944482]
We design a Domain Disentanglement Faster-RCNN (DDF) to eliminate the source-specific information in the features for detection task learning.
Our DDF method facilitates the feature disentanglement at the global and local stages, with a Global Triplet Disentanglement (GTD) module and an Instance Similarity Disentanglement (ISD) module.
By outperforming state-of-the-art methods on four benchmark UDA object detection tasks, our DDF method is demonstrated to be effective with wide applicability.
arXiv Detail & Related papers (2022-01-06T05:43:01Z) - Generalizable Representation Learning for Mixture Domain Face
Anti-Spoofing [53.82826073959756]
Face anti-spoofing approach based on domain generalization(DG) has drawn growing attention due to its robustness forunseen scenarios.
We propose domain dy-namic adjustment meta-learning (D2AM) without using do-main labels.
To overcome the limitation, we propose domain dy-namic adjustment meta-learning (D2AM) without using do-main labels.
arXiv Detail & Related papers (2021-05-06T06:04:59Z) - Stochastic Block-ADMM for Training Deep Networks [16.369102155752824]
We propose Block-ADMM as an approach to train deep neural networks in batch and online settings.
Our method works by splitting neural networks into an arbitrary number of blocks and utilizing auxiliary variables to connect these blocks.
We prove the convergence of our proposed method and justify its capabilities through experiments in supervised and weakly-supervised settings.
arXiv Detail & Related papers (2021-05-01T19:56:13Z) - Decentralized Swarm Collision Avoidance for Quadrotors via End-to-End
Reinforcement Learning [28.592704336574158]
We draw biological inspiration from flocks of starlings and apply the insight to end-to-end learned decentralized collision avoidance.
We propose a new, scalable observation model following a biomimetic topological interaction rule.
Our learned policies are tested in simulation and subsequently transferred to real-world drones to validate their real-world applicability.
arXiv Detail & Related papers (2021-04-30T11:19:03Z) - Attentive WaveBlock: Complementarity-enhanced Mutual Networks for
Unsupervised Domain Adaptation in Person Re-identification and Beyond [97.25179345878443]
This paper proposes a novel light-weight module, the Attentive WaveBlock (AWB)
AWB can be integrated into the dual networks of mutual learning to enhance the complementarity and further depress noise in the pseudo-labels.
Experiments demonstrate that the proposed method achieves state-of-the-art performance with significant improvements on multiple UDA person re-identification tasks.
arXiv Detail & Related papers (2020-06-11T15:40:40Z) - Contradictory Structure Learning for Semi-supervised Domain Adaptation [67.89665267469053]
Current adversarial adaptation methods attempt to align the cross-domain features.
Two challenges remain unsolved: 1) the conditional distribution mismatch and 2) the bias of the decision boundary towards the source domain.
We propose a novel framework for semi-supervised domain adaptation by unifying the learning of opposite structures.
arXiv Detail & Related papers (2020-02-06T22:58:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.