An Industry Evaluation of Embedding-based Entity Alignment
- URL: http://arxiv.org/abs/2010.11522v2
- Date: Sat, 7 Nov 2020 12:25:10 GMT
- Title: An Industry Evaluation of Embedding-based Entity Alignment
- Authors: Ziheng Zhang and Jiaoyan Chen and Xi Chen and Hualuo Liu and Yuejia
Xiang and Bo Liu and Yefeng Zheng
- Abstract summary: Embedding-based entity alignment has been widely investigated in recent years, but most proposed methods still rely on an ideal supervised learning setting.
We evaluate those state-of-the-art methods in an industrial context, where the impact of seed mappings with different sizes and different biases is explored.
- Score: 38.76701634692796
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Embedding-based entity alignment has been widely investigated in recent
years, but most proposed methods still rely on an ideal supervised learning
setting with a large number of unbiased seed mappings for training and
validation, which significantly limits their usage. In this study, we evaluate
those state-of-the-art methods in an industrial context, where the impact of
seed mappings with different sizes and different biases is explored. Besides
the popular benchmarks from DBpedia and Wikidata, we contribute and evaluate a
new industrial benchmark that is extracted from two heterogeneous knowledge
graphs (KGs) under deployment for medical applications. The experimental
results enable the analysis of the advantages and disadvantages of these
alignment methods and the further discussion of suitable strategies for their
industrial deployment.
Related papers
- Top-K Pairwise Ranking: Bridging the Gap Among Ranking-Based Measures for Multi-Label Classification [120.37051160567277]
This paper proposes a novel measure named Top-K Pairwise Ranking (TKPR)
A series of analyses show that TKPR is compatible with existing ranking-based measures.
On the other hand, we establish a sharp generalization bound for the proposed framework based on a novel technique named data-dependent contraction.
arXiv Detail & Related papers (2024-07-09T09:36:37Z) - Experimental Analysis of Large-scale Learnable Vector Storage
Compression [42.52474894105165]
Learnable embedding vector is one of the most important applications in machine learning.
The high dimensionality of sparse data in recommendation tasks and the huge volume of corpus in retrieval-related tasks lead to a large memory consumption of the embedding table.
Recent research has proposed various methods to compress the embeddings at the cost of a slight decrease in model quality or the introduction of other overheads.
arXiv Detail & Related papers (2023-11-27T07:11:47Z) - Plugin estimators for selective classification with out-of-distribution
detection [67.28226919253214]
Real-world classifiers can benefit from abstaining from predicting on samples where they have low confidence.
These settings have been the subject of extensive but disjoint study in the selective classification (SC) and out-of-distribution (OOD) detection literature.
Recent work on selective classification with OOD detection has argued for the unified study of these problems.
We propose new plugin estimators for SCOD that are theoretically grounded, effective, and generalise existing approaches.
arXiv Detail & Related papers (2023-01-29T07:45:17Z) - OpenOOD: Benchmarking Generalized Out-of-Distribution Detection [60.13300701826931]
Out-of-distribution (OOD) detection is vital to safety-critical machine learning applications.
The field currently lacks a unified, strictly formulated, and comprehensive benchmark.
We build a unified, well-structured called OpenOOD, which implements over 30 methods developed in relevant fields.
arXiv Detail & Related papers (2022-10-13T17:59:57Z) - On the role of benchmarking data sets and simulations in method
comparison studies [0.0]
This paper investigates differences and similarities between simulation studies and benchmarking studies.
We borrow ideas from different contexts such as mixed methods research and Clinical Scenario Evaluation.
arXiv Detail & Related papers (2022-08-02T13:47:53Z) - Knowledge Graph Embedding Methods for Entity Alignment: An Experimental
Review [7.241438112282638]
We conduct the first meta-level analysis of popular embedding methods for entity alignment.
Our analysis reveals statistically significant correlations of different embedding methods with various meta-features extracted by KGs.
We rank them in a statistically significant way according to their effectiveness across all real-world KGs of our testbed.
arXiv Detail & Related papers (2022-03-17T12:11:58Z) - f-Domain-Adversarial Learning: Theory and Algorithms [82.97698406515667]
Unsupervised domain adaptation is used in many machine learning applications where, during training, a model has access to unlabeled data in the target domain.
We derive a novel generalization bound for domain adaptation that exploits a new measure of discrepancy between distributions based on a variational characterization of f-divergences.
arXiv Detail & Related papers (2021-06-21T18:21:09Z) - A Benchmarking Study of Embedding-based Entity Alignment for Knowledge
Graphs [30.296238600596997]
Entity alignment seeks to find entities in different knowledge graphs that refer to the same real-world object.
Recent advancement in KG embedding impels the advent of embedding-based entity alignment.
We survey 23 recent embedding-based entity alignment approaches and categorize them based on their techniques and characteristics.
arXiv Detail & Related papers (2020-03-10T05:32:06Z) - A Survey on Causal Inference [64.45536158710014]
Causal inference is a critical research topic across many domains, such as statistics, computer science, education, public policy and economics.
Various causal effect estimation methods for observational data have sprung up.
arXiv Detail & Related papers (2020-02-05T21:35:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.