Knowledge Graph-based Retrieval-Augmented Generation for Schema Matching
- URL: http://arxiv.org/abs/2501.08686v1
- Date: Wed, 15 Jan 2025 09:32:37 GMT
- Title: Knowledge Graph-based Retrieval-Augmented Generation for Schema Matching
- Authors: Chuangtao Ma, Sriom Chakrabarti, Arijit Khan, Bálint Molnár,
- Abstract summary: We propose a Knowledge Graph-based Retrieval-Augmented Generation model for large language models (LLMs) matching.<n>In particular, KG-RAG4SM introduces novel vector-based, graph-based, and query-based graph retrievals.<n>We show that KG-RAG4SM outperforms the state-of-the-art (SOTA) methods by 35.89% and 30.50% in terms of precision and F1 score on the MIMIC dataset.
- Score: 3.7548609506798485
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Traditional similarity-based schema matching methods are incapable of resolving semantic ambiguities and conflicts in domain-specific complex mapping scenarios due to missing commonsense and domain-specific knowledge. The hallucination problem of large language models (LLMs) also makes it challenging for LLM-based schema matching to address the above issues. Therefore, we propose a Knowledge Graph-based Retrieval-Augmented Generation model for Schema Matching, referred to as the KG-RAG4SM. In particular, KG-RAG4SM introduces novel vector-based, graph traversal-based, and query-based graph retrievals, as well as a hybrid approach and ranking schemes that identify the most relevant subgraphs from external large knowledge graphs (KGs). We showcase that KG-based retrieval-augmented LLMs are capable of generating more accurate results for complex matching cases without any re-training. Our experimental results show that KG-RAG4SM outperforms the LLM-based state-of-the-art (SOTA) methods (e.g., Jellyfish-8B) by 35.89% and 30.50% in terms of precision and F1 score on the MIMIC dataset, respectively; KG-RAG4SM with GPT-4o-mini outperforms the pre-trained language model (PLM)-based SOTA methods (e.g., SMAT) by 69.20% and 21.97% in terms of precision and F1 score on the Synthea dataset, respectively. The results also demonstrate that our approach is more efficient in end-to-end schema matching, and scales to retrieve from large KGs. Our case studies on the dataset from the real-world schema matching scenario exhibit that the hallucination problem of LLMs for schema matching is well mitigated by our solution.
Related papers
- RAKG:Document-level Retrieval Augmented Knowledge Graph Construction [10.013667560362565]
This paper focuses on the task of automatic document-level knowledge graph construction.
It proposes the Document-level Retrieval Augmented Knowledge Graph Construction (RAKG) framework.
arXiv Detail & Related papers (2025-04-14T02:47:23Z) - Talking to GDELT Through Knowledge Graphs [0.6461717749486492]
We study various Retrieval Augmented Regeneration (RAG) approaches to gain an understanding of the strengths and weaknesses of each approach in a question-answering analysis.
To retrieve information from the text corpus we implement a traditional vector store RAG as well as state-of-the-art large language model (LLM) based approaches.
arXiv Detail & Related papers (2025-03-10T17:48:10Z) - Graph-Augmented Reasoning: Evolving Step-by-Step Knowledge Graph Retrieval for LLM Reasoning [55.6623318085391]
Recent large language model (LLM) reasoning suffers from limited domain knowledge, susceptibility to hallucinations, and constrained reasoning depth.
This paper presents the first investigation into integrating step-wise knowledge graph retrieval with step-wise reasoning.
We propose KG-RAR, a framework centered on process-oriented knowledge graph construction, a hierarchical retrieval strategy, and a universal post-retrieval processing and reward model.
arXiv Detail & Related papers (2025-03-03T15:20:41Z) - Grounding LLM Reasoning with Knowledge Graphs [4.279373869671241]
We propose integrating reasoning strategies with Knowledge Graphs to anchor every step or "thought" of the reasoning chains in KG data.
We evaluate both agentic and automated search methods across several reasoning strategies, including Chain-of-Thought (CoT), Tree-of-Thought (ToT), and Graph-of-Thought (GoT)
Our experiments demonstrate that this approach consistently outperforms baseline models.
arXiv Detail & Related papers (2025-02-18T19:20:46Z) - KG-CF: Knowledge Graph Completion with Context Filtering under the Guidance of Large Language Models [55.39134076436266]
KG-CF is a framework tailored for ranking-based knowledge graph completion tasks.<n> KG-CF leverages LLMs' reasoning abilities to filter out irrelevant contexts, achieving superior results on real-world datasets.
arXiv Detail & Related papers (2025-01-06T01:52:15Z) - Magneto: Combining Small and Large Language Models for Schema Matching [8.387623375871055]
Small language models (SLMs) require training data and large language models (LLMs) often incur high computational costs.<n>We present Magneto, a cost-effective and accurate solution for schema matching.
arXiv Detail & Related papers (2024-12-11T08:35:56Z) - Matchmaker: Self-Improving Large Language Model Programs for Schema Matching [60.23571456538149]
We propose a compositional language model program for schema matching, comprised of candidate generation, refinement and confidence scoring.
Matchmaker self-improves in a zero-shot manner without the need for labeled demonstrations.
Empirically, we demonstrate on real-world medical schema matching benchmarks that Matchmaker outperforms previous ML-based approaches.
arXiv Detail & Related papers (2024-10-31T16:34:03Z) - Simple is Effective: The Roles of Graphs and Large Language Models in Knowledge-Graph-Based Retrieval-Augmented Generation [9.844598565914055]
Large Language Models (LLMs) demonstrate strong reasoning abilities but face limitations such as hallucinations and outdated knowledge.
We introduce SubgraphRAG, extending the Knowledge Graph (KG)-based Retrieval-Augmented Generation (RAG) framework that retrieves subgraphs.
Our approach innovatively integrates a lightweight multilayer perceptron with a parallel triple-scoring mechanism for efficient and flexible subgraph retrieval.
arXiv Detail & Related papers (2024-10-28T04:39:32Z) - Distill-SynthKG: Distilling Knowledge Graph Synthesis Workflow for Improved Coverage and Efficiency [59.6772484292295]
Knowledge graphs (KGs) generated by large language models (LLMs) are increasingly valuable for Retrieval-Augmented Generation (RAG) applications.
Existing KG extraction methods rely on prompt-based approaches, which are inefficient for processing large-scale corpora.
We propose SynthKG, a multi-step, document-level synthesis KG workflow based on LLMs.
We also design a novel graph-based retrieval framework for RAG.
arXiv Detail & Related papers (2024-10-22T00:47:54Z) - Fact Finder -- Enhancing Domain Expertise of Large Language Models by Incorporating Knowledge Graphs [2.7386111894524]
We introduce a hybrid system that augments Large Language Models with domain-specific knowledge graphs (KGs)
We evaluate our system on a curated dataset of 69 samples, achieving a precision of 78% in retrieving correct KG nodes.
Our findings indicate that the hybrid system surpasses a standalone LLM in accuracy and completeness.
arXiv Detail & Related papers (2024-08-06T07:45:05Z) - DARG: Dynamic Evaluation of Large Language Models via Adaptive Reasoning Graph [70.79413606968814]
We introduce Dynamic Evaluation of LLMs via Adaptive Reasoning Graph Evolvement (DARG) to dynamically extend current benchmarks with controlled complexity and diversity.
Specifically, we first extract the reasoning graphs of data points in current benchmarks and then perturb the reasoning graphs to generate novel testing data.
Such newly generated test samples can have different levels of complexity while maintaining linguistic diversity similar to the original benchmarks.
arXiv Detail & Related papers (2024-06-25T04:27:53Z) - Hierarchical Memory Learning for Fine-Grained Scene Graph Generation [49.39355372599507]
This paper proposes a novel Hierarchical Memory Learning (HML) framework to learn the model from simple to complex.
After the autonomous partition of coarse and fine predicates, the model is first trained on the coarse predicates and then learns the fine predicates.
arXiv Detail & Related papers (2022-03-14T08:01:14Z) - Revisiting LSTM Networks for Semi-Supervised Text Classification via
Mixed Objective Function [106.69643619725652]
We develop a training strategy that allows even a simple BiLSTM model, when trained with cross-entropy loss, to achieve competitive results.
We report state-of-the-art results for text classification task on several benchmark datasets.
arXiv Detail & Related papers (2020-09-08T21:55:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.