Related papers: Automated Metadata Harmonization Using Entity Resolution & Contextual Embedding

Automated Metadata Harmonization Using Entity Resolution & Contextual Embedding

URL: http://arxiv.org/abs/2010.11827v2
Date: Tue, 1 Dec 2020 16:23:05 GMT
Title: Automated Metadata Harmonization Using Entity Resolution & Contextual Embedding
Authors: Kunal Sawarkar, Meenkakshi Kodati
Abstract summary: We demonstrate automation of this step with the help of Cogntive Database's Db2Vec embedding approach. Apart from matching schemas, we demonstrate that it can also infer the correct ontological structure of the target data model.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: ML Data Curation process typically consist of heterogeneous & federated source systems with varied schema structures; requiring curation process to standardize metadata from different schemas to an inter-operable schema. This manual process of Metadata Harmonization & cataloging slows efficiency of ML-Ops lifecycle. We demonstrate automation of this step with the help of entity resolution methods & also by using Cogntive Database's Db2Vec embedding approach to capture hidden inter-column & intra-column relationships which detect similarity of metadata and then predict metadata columns from source schemas to any standardized schemas. Apart from matching schemas, we demonstrate that it can also infer the correct ontological structure of the target data model.

Related papers

AI-assisted JSON Schema Creation and Mapping [0.0]
We present a hybrid approach that combines large language models (LLMs) with deterministic techniques to enable creation, modification, and schema mapping based on natural language inputs by the user.<n>This work significantly lowers the barrier to structured data modeling and data integration for non-experts.
arXiv Detail & Related papers (2025-08-07T09:27:10Z)
SchemaAgent: A Multi-Agents Framework for Generating Relational Database Schema [35.57815867567431]
Existing efforts are mostly based on customized rules or conventional deep learning models, often producing relational schema. We propose a unified LLM-based multi-agent framework for the automated generation of high-quality database schema.Agent. We incorporate dedicated roles for reflection and inspection, alongside an innovative error detection and correction mechanism to identify rectify issues across various phases.
arXiv Detail & Related papers (2025-03-31T09:39:19Z)
Matchmaker: Self-Improving Large Language Model Programs for Schema Matching [60.23571456538149]
We propose a compositional language model program for schema matching, comprised of candidate generation, refinement and confidence scoring. Matchmaker self-improves in a zero-shot manner without the need for labeled demonstrations. Empirically, we demonstrate on real-world medical schema matching benchmarks that Matchmaker outperforms previous ML-based approaches.
arXiv Detail & Related papers (2024-10-31T16:34:03Z)
Compound Schema Registry [0.0]
We propose the use of generalized schema evolution (GSE) facilitated by a compound AI system. This system employs Large Language Models (LLMs) to interpret the semantics of schema changes. Our approach includes developing a task-specific language, Transformation Language (STL), to generate schema mappings as an intermediate representation.
arXiv Detail & Related papers (2024-06-17T05:50:46Z)
ReMatch: Retrieval Enhanced Schema Matching with LLMs [0.874967598360817]
We present a novel method, named ReMatch, for matching schemas using retrieval-enhanced Large Language Models (LLMs) Our experimental results on large real-world schemas demonstrate that ReMatch is an effective matcher.
arXiv Detail & Related papers (2024-03-03T17:14:40Z)
Schema-adaptable Knowledge Graph Construction [47.772335354080795]
Conventional Knowledge Graph Construction (KGC) approaches typically follow the static information extraction paradigm with a closed set of pre-defined schema. We propose a new task called schema-adaptable KGC, which aims to continually extract entity, relation, and event based on a dynamically changing schema graph without re-training.
arXiv Detail & Related papers (2023-05-15T15:06:20Z)
Universal Information Extraction as Unified Semantic Matching [54.19974454019611]
We decouple information extraction into two abilities, structuring and conceptualizing, which are shared by different tasks and schemas. Based on this paradigm, we propose to universally model various IE tasks with Unified Semantic Matching framework. In this way, USM can jointly encode schema and input text, uniformly extract substructures in parallel, and controllably decode target structures on demand.
arXiv Detail & Related papers (2023-01-09T11:51:31Z)
Metadata Representations for Queryable ML Model Zoos [73.24799582702326]
Machine learning (ML) practitioners and organizations are building model zoos of pre-trained models, containing metadata describing properties of the models. The metatada is currently not standardised; its expressivity is limited; and there is no way to store and query it. In this paper, we advocate for standardized ML model meta-data representation and management, proposing a toolkit supported to help practitioners manage and query that metadata.
arXiv Detail & Related papers (2022-07-19T15:04:14Z)
Proton: Probing Schema Linking Information from Pre-trained Language Models for Text-to-SQL Parsing [66.55478402233399]
We propose a framework to elicit relational structures via a probing procedure based on Poincar'e distance metric. Compared with commonly-used rule-based methods for schema linking, we found that probing relations can robustly capture semantic correspondences. Our framework sets new state-of-the-art performance on three benchmarks.
arXiv Detail & Related papers (2022-06-28T14:05:25Z)
It's AI Match: A Two-Step Approach for Schema Matching Using Embeddings [10.732163031244646]
We propose a novel end-to-end approach for schema matching based on neural embeddings. Our results show that our approach is able to determine correspondences in a robust and reliable way.
arXiv Detail & Related papers (2022-03-08T19:42:28Z)
Mapping Patterns for Virtual Knowledge Graphs [71.61234136161742]
Virtual Knowledge Graphs (VKG) constitute one of the most promising paradigms for integrating and accessing legacy data sources. We build on well-established methodologies and patterns studied in data management, data analysis, and conceptual modeling. We validate our catalog on the considered VKG scenarios, showing it covers the vast majority of patterns present therein.
arXiv Detail & Related papers (2020-12-03T13:54:52Z)
Survive the Schema Changes: Integration of Unmanaged Data Using Deep Learning [2.6464841907587004]
We propose to use deep learning to automatically deal with schema changes through a super cell representation and automatic injection of perturbations to the training data. Our experimental results demonstrate that our proposed approach is effective for two real-world data integration scenarios.
arXiv Detail & Related papers (2020-10-15T08:10:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.