Related papers: Deep Transfer Learning for Multi-source Entity Linkage via Domain Adaptation

Deep Transfer Learning for Multi-source Entity Linkage via Domain Adaptation

URL: http://arxiv.org/abs/2110.14509v1
Date: Wed, 27 Oct 2021 15:20:41 GMT
Title: Deep Transfer Learning for Multi-source Entity Linkage via Domain Adaptation
Authors: Di Jin, Bunyamin Sisman, Hao Wei, Xin Luna Dong, Danai Koutra
Abstract summary: Multi-source entity linkage is critical in high-impact applications such as data cleaning and user stitching. AdaMEL is a deep transfer learning framework that learns generic high-level knowledge to perform multi-source entity linkage. Our framework achieves state-of-the-art results with 8.21% improvement on average over methods based on supervised learning.
Score: 63.24594955429465
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multi-source entity linkage focuses on integrating knowledge from multiple sources by linking the records that represent the same real world entity. This is critical in high-impact applications such as data cleaning and user stitching. The state-of-the-art entity linkage pipelines mainly depend on supervised learning that requires abundant amounts of training data. However, collecting well-labeled training data becomes expensive when the data from many sources arrives incrementally over time. Moreover, the trained models can easily overfit to specific data sources, and thus fail to generalize to new sources due to significant differences in data and label distributions. To address these challenges, we present AdaMEL, a deep transfer learning framework that learns generic high-level knowledge to perform multi-source entity linkage. AdaMEL models the attribute importance that is used to match entities through an attribute-level self-attention mechanism, and leverages the massive unlabeled data from new data sources through domain adaptation to make it generic and data-source agnostic. In addition, AdaMEL is capable of incorporating an additional set of labeled data to more accurately integrate data sources with different attribute importance. Extensive experiments show that our framework achieves state-of-the-art results with 8.21% improvement on average over methods based on supervised learning. Besides, it is more stable in handling different sets of data sources in less runtime.

Related papers

Similarity-Based Domain Adaptation with LLMs [13.692329347889212]
Unsupervised domain adaptation leverages abundant labeled data from various source domains to generalize onto unlabeled target data. This paper introduces a simple framework that utilizes the impressive generalization capabilities of Large Language Models (LLMs) for target data annotation. Our framework achieves impressive performance, specifically, 2.44% accuracy improvement when compared to the SOTA method.
arXiv Detail & Related papers (2025-03-07T09:51:07Z)
Web-Scale Visual Entity Recognition: An LLM-Driven Data Approach [56.55633052479446]
Web-scale visual entity recognition presents significant challenges due to the lack of clean, large-scale training data. We propose a novel methodology to curate such a dataset, leveraging a multimodal large language model (LLM) for label verification, metadata generation, and rationale explanation. Experiments demonstrate that models trained on this automatically curated data achieve state-of-the-art performance on web-scale visual entity recognition tasks.
arXiv Detail & Related papers (2024-10-31T06:55:24Z)
Learning To Rank Resources with GNN [7.337247167823921]
We propose a graph neural network (GNN) based approach to learning-to-rank that is capable of modeling resource-query and resource-resource relationships. Our method outperforms the state-of-the-art by 6.4% to 42% on various performance metrics.
arXiv Detail & Related papers (2023-04-17T02:01:45Z)
Unsupervised Domain Adaptive Learning via Synthetic Data for Person Re-identification [101.1886788396803]
Person re-identification (re-ID) has gained more and more attention due to its widespread applications in video surveillance. Unfortunately, the mainstream deep learning methods still need a large quantity of labeled data to train models. In this paper, we develop a data collector to automatically generate synthetic re-ID samples in a computer game, and construct a data labeler to simultaneously annotate them.
arXiv Detail & Related papers (2021-09-12T15:51:41Z)
InSRL: A Multi-view Learning Framework Fusing Multiple Information Sources for Distantly-supervised Relation Extraction [19.176183245280267]
We introduce two widely-existing sources in knowledge bases, namely entity descriptions and multi-grained entity types. An end-to-end multi-view learning framework is proposed for relation extraction via Intact Space Representation Learning (InSRL)
arXiv Detail & Related papers (2020-12-17T02:49:46Z)
LEAPME: Learning-based Property Matching with Embeddings [5.2078071454435815]
We present a new machine learning-based property matching approach called LEAPME (LEArning-based Property Matching with Embeddings) The approach heavily makes use of word embeddings to better utilize the domain-specific semantics of both property names and instance values. Our comparative evaluation against five baselines for several multi-source datasets with real-world data shows the high effectiveness of LEAPME.
arXiv Detail & Related papers (2020-10-05T12:42:39Z)
Adversarial Knowledge Transfer from Unlabeled Data [62.97253639100014]
We present a novel Adversarial Knowledge Transfer framework for transferring knowledge from internet-scale unlabeled data to improve the performance of a classifier. An important novel aspect of our method is that the unlabeled source data can be of different classes from those of the labeled target data, and there is no need to define a separate pretext task.
arXiv Detail & Related papers (2020-08-13T08:04:27Z)
Dual-Teacher: Integrating Intra-domain and Inter-domain Teachers for Annotation-efficient Cardiac Segmentation [65.81546955181781]
We propose a novel semi-supervised domain adaptation approach, namely Dual-Teacher. The student model learns the knowledge of unlabeled target data and labeled source data by two teacher models. We demonstrate that our approach is able to concurrently utilize unlabeled data and cross-modality data with superior performance.
arXiv Detail & Related papers (2020-07-13T10:00:44Z)
Multi-Center Federated Learning [62.57229809407692]
This paper proposes a novel multi-center aggregation mechanism for federated learning. It learns multiple global models from the non-IID user data and simultaneously derives the optimal matching between users and centers. Our experimental results on benchmark datasets show that our method outperforms several popular federated learning methods.
arXiv Detail & Related papers (2020-05-03T09:14:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.