It's AI Match: A Two-Step Approach for Schema Matching Using Embeddings
- URL: http://arxiv.org/abs/2203.04366v1
- Date: Tue, 8 Mar 2022 19:42:28 GMT
- Title: It's AI Match: A Two-Step Approach for Schema Matching Using Embeddings
- Authors: Benjamin H\"attasch, Michael Truong-Ngoc, Andreas Schmidt, Carsten
Binnig
- Abstract summary: We propose a novel end-to-end approach for schema matching based on neural embeddings.
Our results show that our approach is able to determine correspondences in a robust and reliable way.
- Score: 10.732163031244646
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Since data is often stored in different sources, it needs to be integrated to
gather a global view that is required in order to create value and derive
knowledge from it. A critical step in data integration is schema matching which
aims to find semantic correspondences between elements of two schemata. In
order to reduce the manual effort involved in schema matching, many solutions
for the automatic determination of schema correspondences have already been
developed.
In this paper, we propose a novel end-to-end approach for schema matching
based on neural embeddings. The main idea is to use a two-step approach
consisting of a table matching step followed by an attribute matching step. In
both steps we use embeddings on different levels either representing the whole
table or single attributes. Our results show that our approach is able to
determine correspondences in a robust and reliable way and compared to
traditional schema matching approaches can find non-trivial correspondences.
Related papers
- Matchmaker: Self-Improving Large Language Model Programs for Schema Matching [60.23571456538149]
We propose a compositional language model program for schema matching, comprised of candidate generation, refinement and confidence scoring.
Matchmaker self-improves in a zero-shot manner without the need for labeled demonstrations.
Empirically, we demonstrate on real-world medical schema matching benchmarks that Matchmaker outperforms previous ML-based approaches.
arXiv Detail & Related papers (2024-10-31T16:34:03Z) - Schema Matching with Large Language Models: an Experimental Study [0.580553237364985]
We investigate the use of an off-the-shelf Large Language Models (LLMs) for schema matching.
Our objective is to identify semantic correspondences between elements of two relational schemas using only names and descriptions.
arXiv Detail & Related papers (2024-07-16T15:33:00Z) - ReMatch: Retrieval Enhanced Schema Matching with LLMs [0.874967598360817]
We present a novel method, named ReMatch, for matching schemas using retrieval-enhanced Large Language Models (LLMs)
Our experimental results on large real-world schemas demonstrate that ReMatch is an effective matcher.
arXiv Detail & Related papers (2024-03-03T17:14:40Z) - Inductive Meta-path Learning for Schema-complex Heterogeneous Information Networks [46.325577161493726]
Heterogeneous Information Networks (HINs) are information networks with multiple types of nodes and edges.
The concept of meta-path, i.e., a sequence of entity types and relation types connecting two entities, is proposed to provide the meta-level explainable semantics for various HIN tasks.
arXiv Detail & Related papers (2023-07-08T09:10:43Z) - Schema-adaptable Knowledge Graph Construction [47.772335354080795]
Conventional Knowledge Graph Construction (KGC) approaches typically follow the static information extraction paradigm with a closed set of pre-defined schema.
We propose a new task called schema-adaptable KGC, which aims to continually extract entity, relation, and event based on a dynamically changing schema graph without re-training.
arXiv Detail & Related papers (2023-05-15T15:06:20Z) - Proton: Probing Schema Linking Information from Pre-trained Language
Models for Text-to-SQL Parsing [66.55478402233399]
We propose a framework to elicit relational structures via a probing procedure based on Poincar'e distance metric.
Compared with commonly-used rule-based methods for schema linking, we found that probing relations can robustly capture semantic correspondences.
Our framework sets new state-of-the-art performance on three benchmarks.
arXiv Detail & Related papers (2022-06-28T14:05:25Z) - Improving Multi-task Generalization Ability for Neural Text Matching via
Prompt Learning [54.66399120084227]
Recent state-of-the-art neural text matching models (PLMs) are hard to generalize to different tasks.
We adopt a specialization-generalization training strategy and refer to it as Match-Prompt.
In specialization stage, descriptions of different matching tasks are mapped to only a few prompt tokens.
In generalization stage, text matching model explores the essential matching signals by being trained on diverse multiple matching tasks.
arXiv Detail & Related papers (2022-04-06T11:01:08Z) - Contextualizing Meta-Learning via Learning to Decompose [125.76658595408607]
We propose Learning to Decompose Network (LeadNet) to contextualize the meta-learned support-to-target'' strategy.
LeadNet learns to automatically select the strategy associated with the right via incorporating the change of comparison across contexts with polysemous embeddings.
arXiv Detail & Related papers (2021-06-15T13:10:56Z) - Automated Metadata Harmonization Using Entity Resolution & Contextual
Embedding [0.0]
We demonstrate automation of this step with the help of Cogntive Database's Db2Vec embedding approach.
Apart from matching schemas, we demonstrate that it can also infer the correct ontological structure of the target data model.
arXiv Detail & Related papers (2020-10-17T02:14:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.