Related papers: Distill-SynthKG: Distilling Knowledge Graph Synthesis Workflow for Improved Coverage and Efficiency

Distill-SynthKG: Distilling Knowledge Graph Synthesis Workflow for Improved Coverage and Efficiency

URL: http://arxiv.org/abs/2410.16597v1
Date: Tue, 22 Oct 2024 00:47:54 GMT
Title: Distill-SynthKG: Distilling Knowledge Graph Synthesis Workflow for Improved Coverage and Efficiency
Authors: Prafulla Kumar Choubey, Xin Su, Man Luo, Xiangyu Peng, Caiming Xiong, Tiep Le, Shachar Rosenman, Vasudev Lal, Phil Mui, Ricky Ho, Phillip Howard, Chien-Sheng Wu,
Abstract summary: Knowledge graphs (KGs) generated by large language models (LLMs) are increasingly valuable for Retrieval-Augmented Generation (RAG) applications. Existing KG extraction methods rely on prompt-based approaches, which are inefficient for processing large-scale corpora. We propose SynthKG, a multi-step, document-level synthesis KG workflow based on LLMs. We also design a novel graph-based retrieval framework for RAG.
Score: 59.6772484292295
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Knowledge graphs (KGs) generated by large language models (LLMs) are becoming increasingly valuable for Retrieval-Augmented Generation (RAG) applications that require knowledge-intensive reasoning. However, existing KG extraction methods predominantly rely on prompt-based approaches, which are inefficient for processing large-scale corpora. These approaches often suffer from information loss, particularly with long documents, due to the lack of specialized design for KG construction. Additionally, there is a gap in evaluation datasets and methodologies for ontology-free KG construction. To overcome these limitations, we propose SynthKG, a multi-step, document-level ontology-free KG synthesis workflow based on LLMs. By fine-tuning a smaller LLM on the synthesized document-KG pairs, we streamline the multi-step process into a single-step KG generation approach called Distill-SynthKG, substantially reducing the number of LLM inference calls. Furthermore, we re-purpose existing question-answering datasets to establish KG evaluation datasets and introduce new evaluation metrics. Using KGs produced by Distill-SynthKG, we also design a novel graph-based retrieval framework for RAG. Experimental results demonstrate that Distill-SynthKG not only surpasses all baseline models in KG quality -- including models up to eight times larger -- but also consistently excels in retrieval and question-answering tasks. Our proposed graph retrieval framework also outperforms all KG-retrieval methods across multiple benchmark datasets. We release the SynthKG dataset and Distill-SynthKG model publicly to support further research and development.

Related papers

Talking to GDELT Through Knowledge Graphs [0.6461717749486492]
We study various Retrieval Augmented Regeneration (RAG) approaches to gain an understanding of the strengths and weaknesses of each approach in a question-answering analysis. To retrieve information from the text corpus we implement a traditional vector store RAG as well as state-of-the-art large language model (LLM) based approaches.
arXiv Detail & Related papers (2025-03-10T17:48:10Z)
GLTW: Joint Improved Graph Transformer and LLM via Three-Word Language for Knowledge Graph Completion [52.026016846945424]
We propose a new method called GLTW, which encodes the structural information of KGs and merges it with Large Language Models. Specifically, we introduce an improved Graph Transformer (iGT) that effectively encodes subgraphs with both local and global structural information. Also, we develop a subgraph-based multi-classification training objective, using all entities within KG as classification objects, to boost learning efficiency.
arXiv Detail & Related papers (2025-02-17T06:02:59Z)
Exploiting Large Language Models Capabilities for Question Answer-Driven Knowledge Graph Completion Across Static and Temporal Domains [8.472388165833292]
This paper introduces a new generative completion framework called Generative Subgraph-based KGC (GS-KGC) GS-KGC employs a question-answering format to directly generate target entities, addressing the challenge of questions having multiple possible answers. Our method generates negative samples using known facts to facilitate the discovery of new information.
arXiv Detail & Related papers (2024-08-20T13:13:41Z)
Generate-on-Graph: Treat LLM as both Agent and KG in Incomplete Knowledge Graph Question Answering [87.67177556994525]
We propose a training-free method called Generate-on-Graph (GoG) to generate new factual triples while exploring Knowledge Graphs (KGs) GoG performs reasoning through a Thinking-Searching-Generating framework, which treats LLM as both Agent and KG in IKGQA.
arXiv Detail & Related papers (2024-04-23T04:47:22Z)
KG-Agent: An Efficient Autonomous Agent Framework for Complex Reasoning over Knowledge Graph [134.8631016845467]
We propose an autonomous LLM-based agent framework, called KG-Agent. In KG-Agent, we integrate the LLM, multifunctional toolbox, KG-based executor, and knowledge memory. To guarantee the effectiveness, we leverage program language to formulate the multi-hop reasoning process over the KG.
arXiv Detail & Related papers (2024-02-17T02:07:49Z)
KG-GPT: A General Framework for Reasoning on Knowledge Graphs Using Large Language Models [18.20425100517317]
We propose KG-GPT, a framework leveraging large language models for tasks employing knowledge graphs. KG-GPT comprises three steps: Sentence, Graph Retrieval, and Inference, each aimed at partitioning sentences, retrieving relevant graph components, and deriving logical conclusions. We evaluate KG-GPT using KG-based fact verification and KGQA benchmarks, with the model showing competitive and robust performance, even outperforming several fully-supervised models.
arXiv Detail & Related papers (2023-10-17T12:51:35Z)
PyGraft: Configurable Generation of Synthetic Schemas and Knowledge Graphs at Your Fingertips [3.5923669681271257]
PyGraft is a Python-based tool that generates customized, domain-agnostic schemas and KGs. We aim to empower the generation of a more diverse array of KGs for benchmarking novel approaches in areas such as graph-based machine learning (ML) In ML, this should foster a more holistic evaluation of model performance and generalization capability, thereby going beyond the limited collection of available benchmarks.
arXiv Detail & Related papers (2023-09-07T13:00:09Z)
An Open-Source Knowledge Graph Ecosystem for the Life Sciences [5.665519167428707]
PheKnowLator is a semantic ecosystem for automating the construction of ontologically grounded knowledge graphs. The ecosystem includes KG construction resources, analysis tools, and benchmarks. PheKnowLator enables fully customizable KGs without compromising performance or usability.
arXiv Detail & Related papers (2023-07-11T18:55:09Z)
Collective Knowledge Graph Completion with Mutual Knowledge Distillation [11.922522192224145]
We study the problem of multi-KG completion, where we focus on maximizing the collective knowledge from different KGs. We propose a novel method called CKGC-CKD that uses relation-aware graph convolutional network encoder models on both individual KGs and a large fused KG. Experimental results on multilingual datasets have shown that our method outperforms all state-of-the-art models in the KGC task.
arXiv Detail & Related papers (2023-05-25T09:49:40Z)
KGxBoard: Explainable and Interactive Leaderboard for Evaluation of Knowledge Graph Completion Models [76.01814380927507]
KGxBoard is an interactive framework for performing fine-grained evaluation on meaningful subsets of the data. In our experiments, we highlight the findings with the use of KGxBoard, which would have been impossible to detect with standard averaged single-score metrics.
arXiv Detail & Related papers (2022-08-23T15:11:45Z)
Explainable Sparse Knowledge Graph Completion via High-order Graph Reasoning Network [111.67744771462873]
This paper proposes a novel explainable model for sparse Knowledge Graphs (KGs) It combines high-order reasoning into a graph convolutional network, namely HoGRN. It can not only improve the generalization ability to mitigate the information insufficiency issue but also provide interpretability.
arXiv Detail & Related papers (2022-07-14T10:16:56Z)
Scientific Language Models for Biomedical Knowledge Base Completion: An Empirical Study [62.376800537374024]
We study scientific LMs for KG completion, exploring whether we can tap into their latent knowledge to enhance biomedical link prediction. We integrate the LM-based models with KG embedding models, using a router method that learns to assign each input example to either type of model and provides a substantial boost in performance.
arXiv Detail & Related papers (2021-06-17T17:55:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.