ClusterEA: Scalable Entity Alignment with Stochastic Training and
Normalized Mini-batch Similarities
- URL: http://arxiv.org/abs/2205.10312v1
- Date: Fri, 20 May 2022 17:29:50 GMT
- Title: ClusterEA: Scalable Entity Alignment with Stochastic Training and
Normalized Mini-batch Similarities
- Authors: Yunjun Gao, Xiaoze Liu, Junyang Wu, Tianyi Li, Pengfei Wang, Lu Chen
- Abstract summary: ClusterEA is capable of scaling up EA models and enhancing their results by leveraging normalization methods on mini-batches.
It first trains a large-scale GNN for EA in a fashion to produce entity embeddings.
Based on the embeddings, a novel ClusterSampler strategy is proposed for sampling highly overlapped mini-batches.
- Score: 26.724014626196322
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Entity alignment (EA) aims at finding equivalent entities in different
knowledge graphs (KGs). Embedding-based approaches have dominated the EA task
in recent years. Those methods face problems that come from the geometric
properties of embedding vectors, including hubness and isolation. To solve
these geometric problems, many normalization approaches have been adopted to
EA. However, the increasing scale of KGs renders it is hard for EA models to
adopt the normalization processes, thus limiting their usage in real-world
applications. To tackle this challenge, we present ClusterEA, a general
framework that is capable of scaling up EA models and enhancing their results
by leveraging normalization methods on mini-batches with a high entity
equivalent rate. ClusterEA contains three components to align entities between
large-scale KGs, including stochastic training, ClusterSampler, and
SparseFusion. It first trains a large-scale Siamese GNN for EA in a stochastic
fashion to produce entity embeddings. Based on the embeddings, a novel
ClusterSampler strategy is proposed for sampling highly overlapped
mini-batches. Finally, ClusterEA incorporates SparseFusion, which normalizes
local and global similarity and then fuses all similarity matrices to obtain
the final similarity matrix. Extensive experiments with real-life datasets on
EA benchmarks offer insight into the proposed framework, and suggest that it is
capable of outperforming the state-of-the-art scalable EA framework by up to 8
times in terms of Hits@1.
Related papers
- Incorporating Arbitrary Matrix Group Equivariance into KANs [69.30866522377694]
Kolmogorov-Arnold Networks (KANs) have seen great success in scientific domains.
However, spline functions may not respect symmetry in tasks, which is crucial prior knowledge in machine learning.
We propose Equivariant Kolmogorov-Arnold Networks (EKAN) to broaden their applicability to more fields.
arXiv Detail & Related papers (2024-10-01T06:34:58Z) - Fast Semisupervised Unmixing Using Nonconvex Optimization [80.11512905623417]
We introduce a novel convex convex model for semi/library-based unmixing.
We demonstrate the efficacy of Alternating Methods of sparse unsupervised unmixing.
arXiv Detail & Related papers (2024-01-23T10:07:41Z) - SIGMA: Scale-Invariant Global Sparse Shape Matching [50.385414715675076]
We propose a novel mixed-integer programming (MIP) formulation for generating precise sparse correspondences for non-rigid shapes.
We show state-of-the-art results for sparse non-rigid matching on several challenging 3D datasets.
arXiv Detail & Related papers (2023-08-16T14:25:30Z) - Toward Practical Entity Alignment Method Design: Insights from New
Highly Heterogeneous Knowledge Graph Datasets [32.68422342604253]
We study the performance of entity alignment (EA) methods in practical settings, specifically focusing on the alignment of highly heterogeneous KGs (HHKGs)
Our findings reveal that, in aligning HHKGs, valuable structure information can hardly be exploited through message-passing and aggregation mechanisms.
These findings shed light on the potential problems associated with the conventional application of GNN-based methods as a panacea for all EA datasets.
arXiv Detail & Related papers (2023-04-07T04:10:26Z) - LightEA: A Scalable, Robust, and Interpretable Entity Alignment
Framework via Three-view Label Propagation [27.483109233276632]
We argue that existing GNN-based EA methods inherit the inborn defects from their neural network lineage: weak scalability and poor interpretability.
We propose a non-neural EA framework -- LightEA, consisting of three efficient components: (i) Random Orthogonal Label Generation, (ii) Three-view Label propagation, and (iii) Sparse Sinkhorn Iteration.
According to the extensive experiments on public datasets, LightEA has impressive scalability, robustness, and interpretability.
arXiv Detail & Related papers (2022-10-19T10:07:08Z) - High-quality Task Division for Large-scale Entity Alignment [28.001266850114643]
DivEA achieves higher EA performance than alternative state-of-the-art solutions.
We devise a counterpart discovery method that exploits the locality principle of the EA task and the power of trained EA models.
arXiv Detail & Related papers (2022-08-22T14:46:38Z) - ActiveEA: Active Learning for Neural Entity Alignment [31.212894129845093]
Entity alignment (EA) aims to match equivalent entities across different Knowledge Graphs (KGs)
Current mainstream methods -- neural EA models -- rely on training with seed alignment, i.e., a set of pre-aligned entity pairs.
We devise a novel Active Learning (AL) framework for neural EA, aiming to create highly informative seed alignment.
arXiv Detail & Related papers (2021-10-13T03:38:04Z) - Are Negative Samples Necessary in Entity Alignment? An Approach with
High Performance, Scalability and Robustness [26.04006507181558]
We propose a novel EA method with three new components to enable high Performance, high Scalability, and high Robustness.
We conduct detailed experiments on several public datasets to examine the effectiveness and efficiency of our proposed method.
arXiv Detail & Related papers (2021-08-11T15:20:41Z) - Semantic Correspondence with Transformers [68.37049687360705]
We propose Cost Aggregation with Transformers (CATs) to find dense correspondences between semantically similar images.
We include appearance affinity modelling to disambiguate the initial correlation maps and multi-level aggregation.
We conduct experiments to demonstrate the effectiveness of the proposed model over the latest methods and provide extensive ablation studies.
arXiv Detail & Related papers (2021-06-04T14:39:03Z) - Energy-Efficient and Federated Meta-Learning via Projected Stochastic
Gradient Ascent [79.58680275615752]
We propose an energy-efficient federated meta-learning framework.
We assume each task is owned by a separate agent, so a limited number of tasks is used to train a meta-model.
arXiv Detail & Related papers (2021-05-31T08:15:44Z) - Cauchy-Schwarz Regularized Autoencoder [68.80569889599434]
Variational autoencoders (VAE) are a powerful and widely-used class of generative models.
We introduce a new constrained objective based on the Cauchy-Schwarz divergence, which can be computed analytically for GMMs.
Our objective improves upon variational auto-encoding models in density estimation, unsupervised clustering, semi-supervised learning, and face analysis.
arXiv Detail & Related papers (2021-01-06T17:36:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.