Deep Transfer Learning for Multi-source Entity Linkage via Domain
Adaptation
- URL: http://arxiv.org/abs/2110.14509v1
- Date: Wed, 27 Oct 2021 15:20:41 GMT
- Title: Deep Transfer Learning for Multi-source Entity Linkage via Domain
Adaptation
- Authors: Di Jin, Bunyamin Sisman, Hao Wei, Xin Luna Dong, Danai Koutra
- Abstract summary: Multi-source entity linkage is critical in high-impact applications such as data cleaning and user stitching.
AdaMEL is a deep transfer learning framework that learns generic high-level knowledge to perform multi-source entity linkage.
Our framework achieves state-of-the-art results with 8.21% improvement on average over methods based on supervised learning.
- Score: 63.24594955429465
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-source entity linkage focuses on integrating knowledge from multiple
sources by linking the records that represent the same real world entity. This
is critical in high-impact applications such as data cleaning and user
stitching. The state-of-the-art entity linkage pipelines mainly depend on
supervised learning that requires abundant amounts of training data. However,
collecting well-labeled training data becomes expensive when the data from many
sources arrives incrementally over time. Moreover, the trained models can
easily overfit to specific data sources, and thus fail to generalize to new
sources due to significant differences in data and label distributions. To
address these challenges, we present AdaMEL, a deep transfer learning framework
that learns generic high-level knowledge to perform multi-source entity
linkage. AdaMEL models the attribute importance that is used to match entities
through an attribute-level self-attention mechanism, and leverages the massive
unlabeled data from new data sources through domain adaptation to make it
generic and data-source agnostic. In addition, AdaMEL is capable of
incorporating an additional set of labeled data to more accurately integrate
data sources with different attribute importance. Extensive experiments show
that our framework achieves state-of-the-art results with 8.21% improvement on
average over methods based on supervised learning. Besides, it is more stable
in handling different sets of data sources in less runtime.
Related papers
- Web-Scale Visual Entity Recognition: An LLM-Driven Data Approach [56.55633052479446]
Web-scale visual entity recognition presents significant challenges due to the lack of clean, large-scale training data.
We propose a novel methodology to curate such a dataset, leveraging a multimodal large language model (LLM) for label verification, metadata generation, and rationale explanation.
Experiments demonstrate that models trained on this automatically curated data achieve state-of-the-art performance on web-scale visual entity recognition tasks.
arXiv Detail & Related papers (2024-10-31T06:55:24Z) - Interpretable Multi-Source Data Fusion Through Latent Variable Gaussian Process [8.207427766052044]
The proposed approach is demonstrated on and analyzed through two mathematical (representative parabola problem, 2D Ackley function) and two materials science (design of FeCrAl and SmCoFe alloys) case studies.
It is observed that compared to using single-source and source unaware ML models, the proposed multi-source data fusion framework can provide better predictions for sparse-data problems, interpretability regarding the sources, and enhanced modeling capabilities by taking advantage of the correlations and relationships among different sources.
arXiv Detail & Related papers (2024-02-06T16:54:59Z) - Unsupervised Domain Adaptive Learning via Synthetic Data for Person
Re-identification [101.1886788396803]
Person re-identification (re-ID) has gained more and more attention due to its widespread applications in video surveillance.
Unfortunately, the mainstream deep learning methods still need a large quantity of labeled data to train models.
In this paper, we develop a data collector to automatically generate synthetic re-ID samples in a computer game, and construct a data labeler to simultaneously annotate them.
arXiv Detail & Related papers (2021-09-12T15:51:41Z) - InSRL: A Multi-view Learning Framework Fusing Multiple Information
Sources for Distantly-supervised Relation Extraction [19.176183245280267]
We introduce two widely-existing sources in knowledge bases, namely entity descriptions and multi-grained entity types.
An end-to-end multi-view learning framework is proposed for relation extraction via Intact Space Representation Learning (InSRL)
arXiv Detail & Related papers (2020-12-17T02:49:46Z) - LEAPME: Learning-based Property Matching with Embeddings [5.2078071454435815]
We present a new machine learning-based property matching approach called LEAPME (LEArning-based Property Matching with Embeddings)
The approach heavily makes use of word embeddings to better utilize the domain-specific semantics of both property names and instance values.
Our comparative evaluation against five baselines for several multi-source datasets with real-world data shows the high effectiveness of LEAPME.
arXiv Detail & Related papers (2020-10-05T12:42:39Z) - Adversarial Knowledge Transfer from Unlabeled Data [62.97253639100014]
We present a novel Adversarial Knowledge Transfer framework for transferring knowledge from internet-scale unlabeled data to improve the performance of a classifier.
An important novel aspect of our method is that the unlabeled source data can be of different classes from those of the labeled target data, and there is no need to define a separate pretext task.
arXiv Detail & Related papers (2020-08-13T08:04:27Z) - Dual-Teacher: Integrating Intra-domain and Inter-domain Teachers for
Annotation-efficient Cardiac Segmentation [65.81546955181781]
We propose a novel semi-supervised domain adaptation approach, namely Dual-Teacher.
The student model learns the knowledge of unlabeled target data and labeled source data by two teacher models.
We demonstrate that our approach is able to concurrently utilize unlabeled data and cross-modality data with superior performance.
arXiv Detail & Related papers (2020-07-13T10:00:44Z) - Multi-Center Federated Learning [62.57229809407692]
This paper proposes a novel multi-center aggregation mechanism for federated learning.
It learns multiple global models from the non-IID user data and simultaneously derives the optimal matching between users and centers.
Our experimental results on benchmark datasets show that our method outperforms several popular federated learning methods.
arXiv Detail & Related papers (2020-05-03T09:14:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.