Automated Metadata Harmonization Using Entity Resolution & Contextual
Embedding
- URL: http://arxiv.org/abs/2010.11827v2
- Date: Tue, 1 Dec 2020 16:23:05 GMT
- Title: Automated Metadata Harmonization Using Entity Resolution & Contextual
Embedding
- Authors: Kunal Sawarkar, Meenkakshi Kodati
- Abstract summary: We demonstrate automation of this step with the help of Cogntive Database's Db2Vec embedding approach.
Apart from matching schemas, we demonstrate that it can also infer the correct ontological structure of the target data model.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: ML Data Curation process typically consist of heterogeneous & federated
source systems with varied schema structures; requiring curation process to
standardize metadata from different schemas to an inter-operable schema. This
manual process of Metadata Harmonization & cataloging slows efficiency of
ML-Ops lifecycle. We demonstrate automation of this step with the help of
entity resolution methods & also by using Cogntive Database's Db2Vec embedding
approach to capture hidden inter-column & intra-column relationships which
detect similarity of metadata and then predict metadata columns from source
schemas to any standardized schemas. Apart from matching schemas, we
demonstrate that it can also infer the correct ontological structure of the
target data model.
Related papers
- Matchmaker: Self-Improving Large Language Model Programs for Schema Matching [60.23571456538149]
We propose a compositional language model program for schema matching, comprised of candidate generation, refinement and confidence scoring.
Matchmaker self-improves in a zero-shot manner without the need for labeled demonstrations.
Empirically, we demonstrate on real-world medical schema matching benchmarks that Matchmaker outperforms previous ML-based approaches.
arXiv Detail & Related papers (2024-10-31T16:34:03Z) - Compound Schema Registry [0.0]
We propose the use of generalized schema evolution (GSE) facilitated by a compound AI system.
This system employs Large Language Models (LLMs) to interpret the semantics of schema changes.
Our approach includes developing a task-specific language, Transformation Language (STL), to generate schema mappings as an intermediate representation.
arXiv Detail & Related papers (2024-06-17T05:50:46Z) - ReMatch: Retrieval Enhanced Schema Matching with LLMs [0.874967598360817]
We present a novel method, named ReMatch, for matching schemas using retrieval-enhanced Large Language Models (LLMs)
Our experimental results on large real-world schemas demonstrate that ReMatch is an effective matcher.
arXiv Detail & Related papers (2024-03-03T17:14:40Z) - Schema-adaptable Knowledge Graph Construction [47.772335354080795]
Conventional Knowledge Graph Construction (KGC) approaches typically follow the static information extraction paradigm with a closed set of pre-defined schema.
We propose a new task called schema-adaptable KGC, which aims to continually extract entity, relation, and event based on a dynamically changing schema graph without re-training.
arXiv Detail & Related papers (2023-05-15T15:06:20Z) - Universal Information Extraction as Unified Semantic Matching [54.19974454019611]
We decouple information extraction into two abilities, structuring and conceptualizing, which are shared by different tasks and schemas.
Based on this paradigm, we propose to universally model various IE tasks with Unified Semantic Matching framework.
In this way, USM can jointly encode schema and input text, uniformly extract substructures in parallel, and controllably decode target structures on demand.
arXiv Detail & Related papers (2023-01-09T11:51:31Z) - Metadata Representations for Queryable ML Model Zoos [73.24799582702326]
Machine learning (ML) practitioners and organizations are building model zoos of pre-trained models, containing metadata describing properties of the models.
The metatada is currently not standardised; its expressivity is limited; and there is no way to store and query it.
In this paper, we advocate for standardized ML model meta-data representation and management, proposing a toolkit supported to help practitioners manage and query that metadata.
arXiv Detail & Related papers (2022-07-19T15:04:14Z) - Proton: Probing Schema Linking Information from Pre-trained Language
Models for Text-to-SQL Parsing [66.55478402233399]
We propose a framework to elicit relational structures via a probing procedure based on Poincar'e distance metric.
Compared with commonly-used rule-based methods for schema linking, we found that probing relations can robustly capture semantic correspondences.
Our framework sets new state-of-the-art performance on three benchmarks.
arXiv Detail & Related papers (2022-06-28T14:05:25Z) - It's AI Match: A Two-Step Approach for Schema Matching Using Embeddings [10.732163031244646]
We propose a novel end-to-end approach for schema matching based on neural embeddings.
Our results show that our approach is able to determine correspondences in a robust and reliable way.
arXiv Detail & Related papers (2022-03-08T19:42:28Z) - Mapping Patterns for Virtual Knowledge Graphs [71.61234136161742]
Virtual Knowledge Graphs (VKG) constitute one of the most promising paradigms for integrating and accessing legacy data sources.
We build on well-established methodologies and patterns studied in data management, data analysis, and conceptual modeling.
We validate our catalog on the considered VKG scenarios, showing it covers the vast majority of patterns present therein.
arXiv Detail & Related papers (2020-12-03T13:54:52Z) - Survive the Schema Changes: Integration of Unmanaged Data Using Deep
Learning [2.6464841907587004]
We propose to use deep learning to automatically deal with schema changes through a super cell representation and automatic injection of perturbations to the training data.
Our experimental results demonstrate that our proposed approach is effective for two real-world data integration scenarios.
arXiv Detail & Related papers (2020-10-15T08:10:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.