LEAPME: Learning-based Property Matching with Embeddings
- URL: http://arxiv.org/abs/2010.01951v1
- Date: Mon, 5 Oct 2020 12:42:39 GMT
- Title: LEAPME: Learning-based Property Matching with Embeddings
- Authors: Daniel Ayala, Inma Hern\'andez, David Ruiz, Erhard Rahm
- Abstract summary: We present a new machine learning-based property matching approach called LEAPME (LEArning-based Property Matching with Embeddings)
The approach heavily makes use of word embeddings to better utilize the domain-specific semantics of both property names and instance values.
Our comparative evaluation against five baselines for several multi-source datasets with real-world data shows the high effectiveness of LEAPME.
- Score: 5.2078071454435815
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data integration tasks such as the creation and extension of knowledge graphs
involve the fusion of heterogeneous entities from many sources. Matching and
fusion of such entities require to also match and combine their properties
(attributes). However, previous schema matching approaches mostly focus on two
sources only and often rely on simple similarity measurements. They thus face
problems in challenging use cases such as the integration of heterogeneous
product entities from many sources.
We therefore present a new machine learning-based property matching approach
called LEAPME (LEArning-based Property Matching with Embeddings) that utilizes
numerous features of both property names and instance values. The approach
heavily makes use of word embeddings to better utilize the domain-specific
semantics of both property names and instance values. The use of supervised
machine learning helps exploit the predictive power of word embeddings.
Our comparative evaluation against five baselines for several multi-source
datasets with real-world data shows the high effectiveness of LEAPME. We also
show that our approach is even effective when training data from another domain
(transfer learning) is used.
Related papers
- Cross-Domain Few-Shot Relation Extraction via Representation Learning
and Domain Adaptation [1.1602089225841632]
Few-shot relation extraction aims to recognize novel relations with few labeled sentences in each relation.
Previous metric-based few-shot relation extraction algorithms identify relationships by comparing the prototypes generated by the few labeled sentences embedding with the embeddings of the query sentences using a trained metric function.
We suggest learning more interpretable and efficient prototypes from prior knowledge and the intrinsic semantics of relations to extract new relations in various domains more effectively.
arXiv Detail & Related papers (2022-12-05T19:34:52Z) - Can I see an Example? Active Learning the Long Tail of Attributes and
Relations [64.50739983632006]
We introduce a novel incremental active learning framework that asks for attributes and relations in visual scenes.
While conventional active learning methods ask for labels of specific examples, we flip this framing to allow agents to ask for examples from specific categories.
Using this framing, we introduce an active sampling method that asks for examples from the tail of the data distribution and show that it outperforms classical active learning methods on Visual Genome.
arXiv Detail & Related papers (2022-03-11T19:28:19Z) - Deep Transfer Learning for Multi-source Entity Linkage via Domain
Adaptation [63.24594955429465]
Multi-source entity linkage is critical in high-impact applications such as data cleaning and user stitching.
AdaMEL is a deep transfer learning framework that learns generic high-level knowledge to perform multi-source entity linkage.
Our framework achieves state-of-the-art results with 8.21% improvement on average over methods based on supervised learning.
arXiv Detail & Related papers (2021-10-27T15:20:41Z) - Interpretable and Low-Resource Entity Matching via Decoupling Feature
Learning from Decision Making [22.755892575582788]
Entity Matching aims at recognizing entity records that denote the same real-world object.
We propose a novel EM framework that consists of Heterogeneous Information Fusion (HIF) and Key Attribute Tree (KAT) Induction.
Our method is highly efficient and outperforms SOTA EM models in most cases.
arXiv Detail & Related papers (2021-06-08T08:27:31Z) - Few-Shot Named Entity Recognition: A Comprehensive Study [92.40991050806544]
We investigate three schemes to improve the model generalization ability for few-shot settings.
We perform empirical comparisons on 10 public NER datasets with various proportions of labeled data.
We create new state-of-the-art results on both few-shot and training-free settings.
arXiv Detail & Related papers (2020-12-29T23:43:16Z) - Learning to Combine: Knowledge Aggregation for Multi-Source Domain
Adaptation [56.694330303488435]
We propose a Learning to Combine for Multi-Source Domain Adaptation (LtC-MSDA) framework.
In the nutshell, a knowledge graph is constructed on the prototypes of various domains to realize the information propagation among semantically adjacent representations.
Our approach outperforms existing methods with a remarkable margin.
arXiv Detail & Related papers (2020-07-17T07:52:44Z) - Relation-Guided Representation Learning [53.60351496449232]
We propose a new representation learning method that explicitly models and leverages sample relations.
Our framework well preserves the relations between samples.
By seeking to embed samples into subspace, we show that our method can address the large-scale and out-of-sample problem.
arXiv Detail & Related papers (2020-07-11T10:57:45Z) - Inferential Text Generation with Multiple Knowledge Sources and
Meta-Learning [117.23425857240679]
We study the problem of generating inferential texts of events for a variety of commonsense like textitif-else relations.
Existing approaches typically use limited evidence from training examples and learn for each relation individually.
In this work, we use multiple knowledge sources as fuels for the model.
arXiv Detail & Related papers (2020-04-07T01:49:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.