LLM-based SPARQL Query Generation from Natural Language over Federated Knowledge Graphs
- URL: http://arxiv.org/abs/2410.06062v3
- Date: Mon, 21 Oct 2024 09:13:48 GMT
- Title: LLM-based SPARQL Query Generation from Natural Language over Federated Knowledge Graphs
- Authors: Vincent Emonet, Jerven Bolleman, Severine Duvaud, Tarcisio Mendes de Farias, Ana Claudia Sima,
- Abstract summary: We introduce a Retrieval-Augmented Generation (RAG) system for translating user questions into accurate SPARQL queries over bioinformatics knowledge graphs (KGs)
To enhance accuracy and reduce hallucinations in query generation, our system utilise metadata from the KGs, including query examples and schema information, and incorporates a validation step to correct generated queries.
The system is available online at chat.expasy.org.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce a Retrieval-Augmented Generation (RAG) system for translating user questions into accurate federated SPARQL queries over bioinformatics knowledge graphs (KGs) leveraging Large Language Models (LLMs). To enhance accuracy and reduce hallucinations in query generation, our system utilises metadata from the KGs, including query examples and schema information, and incorporates a validation step to correct generated queries. The system is available online at chat.expasy.org.
Related papers
- Reducing Hallucinations in Language Model-based SPARQL Query Generation Using Post-Generation Memory Retrieval [9.860751439256754]
Large language models (LLMs) are susceptible to hallucinations and out-of-distribution errors when producing KG elements.
This has led to increased research aimed at detecting and mitigating such errors.
In this paper, we introduce PGMR, a modular framework that incorporates a non-parametric memory module to retrieve KG elements.
arXiv Detail & Related papers (2025-02-19T02:08:13Z) - Harnessing Large Language Models for Knowledge Graph Question Answering via Adaptive Multi-Aspect Retrieval-Augmentation [81.18701211912779]
We introduce an Adaptive Multi-Aspect Retrieval-augmented over KGs (Amar) framework.
This method retrieves knowledge including entities, relations, and subgraphs, and converts each piece of retrieved text into prompt embeddings.
Our method has achieved state-of-the-art performance on two common datasets.
arXiv Detail & Related papers (2024-12-24T16:38:04Z) - RAGONITE: Iterative Retrieval on Induced Databases and Verbalized RDF for Conversational QA over KGs with RAG [6.4032082023113475]
SPARQL is brittle for complex intents and conversational questions.
We propose a novel two-pronged system where we fuse: (i) SPARQL results over a database automatically derived from the knowledge graph, and (ii) text-search results over verbalizations of KG facts.
Our pipeline supports iterative retrieval: when the results of any branch are found to be unsatisfactory, the system can automatically opt for further rounds.
We demonstrate the superiority of our proposed system over several baselines on a knowledge graph of BMW automobiles.
arXiv Detail & Related papers (2024-12-23T16:16:30Z) - Towards Evaluating Large Language Models for Graph Query Generation [49.49881799107061]
Large Language Models (LLMs) are revolutionizing the landscape of Generative Artificial Intelligence (GenAI)
This paper presents a comparative study addressing the challenge of generating queries a powerful language for interacting with graph databases using open-access LLMs.
Our empirical analysis of query generation accuracy reveals that Claude Sonnet 3.5 outperforms its counterparts in this specific domain.
arXiv Detail & Related papers (2024-11-13T09:11:56Z) - Effective Instruction Parsing Plugin for Complex Logical Query Answering on Knowledge Graphs [51.33342412699939]
Knowledge Graph Query Embedding (KGQE) aims to embed First-Order Logic (FOL) queries in a low-dimensional KG space for complex reasoning over incomplete KGs.
Recent studies integrate various external information (such as entity types and relation context) to better capture the logical semantics of FOL queries.
We propose an effective Query Instruction Parsing (QIPP) that captures latent query patterns from code-like query instructions.
arXiv Detail & Related papers (2024-10-27T03:18:52Z) - Less is More: Making Smaller Language Models Competent Subgraph Retrievers for Multi-hop KGQA [51.3033125256716]
We model the subgraph retrieval task as a conditional generation task handled by small language models.
Our base generative subgraph retrieval model, consisting of only 220M parameters, competitive retrieval performance compared to state-of-the-art models.
Our largest 3B model, when plugged with an LLM reader, sets new SOTA end-to-end performance on both the WebQSP and CWQ benchmarks.
arXiv Detail & Related papers (2024-10-08T15:22:36Z) - A large collection of bioinformatics question-query pairs over federated knowledge graphs: methodology and applications [0.0838491111002084]
We introduce a large collection of human-written natural language questions and their corresponding SPARQL queries over federated bioinformatics knowledge graphs.
We propose a methodology to uniformly represent the examples with minimal metadata, based on existing standards.
arXiv Detail & Related papers (2024-10-08T13:08:07Z) - Chatbot-Based Ontology Interaction Using Large Language Models and Domain-Specific Standards [41.19948826527649]
Large Language Models (LLMs) are employed to enhance SPARQL query generation.
System converts user inquiries into accurate SPARQL queries.
Additional information from established domain-specific standards is integrated into the interface.
arXiv Detail & Related papers (2024-07-22T11:58:36Z) - SPARQL Generation: an analysis on fine-tuning OpenLLaMA for Question
Answering over a Life Science Knowledge Graph [0.0]
We evaluate strategies for fine-tuning the OpenLlama LLM for question answering over life science knowledge graphs.
We propose an end-to-end data augmentation approach for extending a set of existing queries over a given knowledge graph.
We also investigate the role of semantic "clues" in the queries, such as meaningful variable names and inline comments.
arXiv Detail & Related papers (2024-02-07T07:24:01Z) - An In-Context Schema Understanding Method for Knowledge Base Question
Answering [70.87993081445127]
Large Language Models (LLMs) have shown strong capabilities in language understanding and can be used to solve this task.
Existing methods bypass this challenge by initially employing LLMs to generate drafts of logic forms without schema-specific details.
We propose a simple In-Context Understanding (ICSU) method that enables LLMs to directly understand schemas by leveraging in-context learning.
arXiv Detail & Related papers (2023-10-22T04:19:17Z) - Entailment Tree Explanations via Iterative Retrieval-Generation Reasoner [56.08919422452905]
We propose an architecture called Iterative Retrieval-Generation Reasoner (IRGR)
Our model is able to explain a given hypothesis by systematically generating a step-by-step explanation from textual premises.
We outperform existing benchmarks on premise retrieval and entailment tree generation, with around 300% gain in overall correctness.
arXiv Detail & Related papers (2022-05-18T21:52:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.