SPARQL Generation: an analysis on fine-tuning OpenLLaMA for Question
Answering over a Life Science Knowledge Graph
- URL: http://arxiv.org/abs/2402.04627v1
- Date: Wed, 7 Feb 2024 07:24:01 GMT
- Title: SPARQL Generation: an analysis on fine-tuning OpenLLaMA for Question
Answering over a Life Science Knowledge Graph
- Authors: Julio C. Rangel, Tarcisio Mendes de Farias, Ana Claudia Sima and Norio
Kobayashi
- Abstract summary: We evaluate strategies for fine-tuning the OpenLlama LLM for question answering over life science knowledge graphs.
We propose an end-to-end data augmentation approach for extending a set of existing queries over a given knowledge graph.
We also investigate the role of semantic "clues" in the queries, such as meaningful variable names and inline comments.
- Score: 0.0
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: The recent success of Large Language Models (LLM) in a wide range of Natural
Language Processing applications opens the path towards novel Question
Answering Systems over Knowledge Graphs leveraging LLMs. However, one of the
main obstacles preventing their implementation is the scarcity of training data
for the task of translating questions into corresponding SPARQL queries,
particularly in the case of domain-specific KGs. To overcome this challenge, in
this study, we evaluate several strategies for fine-tuning the OpenLlama LLM
for question answering over life science knowledge graphs. In particular, we
propose an end-to-end data augmentation approach for extending a set of
existing queries over a given knowledge graph towards a larger dataset of
semantically enriched question-to-SPARQL query pairs, enabling fine-tuning even
for datasets where these pairs are scarce. In this context, we also investigate
the role of semantic "clues" in the queries, such as meaningful variable names
and inline comments. Finally, we evaluate our approach over the real-world Bgee
gene expression knowledge graph and we show that semantic clues can improve
model performance by up to 33% compared to a baseline with random variable
names and no comments included.
Related papers
- Assessing SPARQL capabilities of Large Language Models [0.0]
We focus on measuring out-of-the box capabilities of Large Language Models to work with SPARQL.
We implement benchmarking tasks in the LLM-KG-Bench framework for automated execution and evaluation.
Our findings indicate that working with SPARQL SELECT queries is still challenging for LLMs.
arXiv Detail & Related papers (2024-09-09T08:29:39Z) - Integrating Large Language Models with Graph-based Reasoning for Conversational Question Answering [58.17090503446995]
We focus on a conversational question answering task which combines the challenges of understanding questions in context and reasoning over evidence gathered from heterogeneous sources like text, knowledge graphs, tables, and infoboxes.
Our method utilizes a graph structured representation to aggregate information about a question and its context.
arXiv Detail & Related papers (2024-06-14T13:28:03Z) - Optimizing Language Model's Reasoning Abilities with Weak Supervision [48.60598455782159]
We present textscPuzzleBen, a weakly supervised benchmark that comprises 25,147 complex questions, answers, and human-generated rationales.
A unique aspect of our dataset is the inclusion of 10,000 unannotated questions, enabling us to explore utilizing fewer supersized data to boost LLMs' inference capabilities.
arXiv Detail & Related papers (2024-05-07T07:39:15Z) - LaSagnA: Language-based Segmentation Assistant for Complex Queries [39.620806493454616]
Large Language Models for Vision (vLLMs) generate detailed perceptual outcomes, including bounding boxes and masks.
In this study, we acknowledge that the main cause of these problems is the insufficient complexity of training queries.
We present three novel strategies to effectively handle the challenges arising from the direct integration of the proposed format.
arXiv Detail & Related papers (2024-04-12T14:40:45Z) - Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity [59.57065228857247]
Retrieval-augmented Large Language Models (LLMs) have emerged as a promising approach to enhancing response accuracy in several tasks, such as Question-Answering (QA)
We propose a novel adaptive QA framework, that can dynamically select the most suitable strategy for (retrieval-augmented) LLMs based on the query complexity.
We validate our model on a set of open-domain QA datasets, covering multiple query complexities, and show that ours enhances the overall efficiency and accuracy of QA systems.
arXiv Detail & Related papers (2024-03-21T13:52:30Z) - Context Matters: Pushing the Boundaries of Open-Ended Answer Generation with Graph-Structured Knowledge Context [4.1229332722825]
This paper introduces a novel framework that combines graph-driven context retrieval in conjunction to knowledge graphs based enhancement.
We conduct experiments on various Large Language Models (LLMs) with different parameter sizes to evaluate their ability to ground knowledge and determine factual accuracy in answers to open-ended questions.
Our methodology GraphContextGen consistently outperforms dominant text-based retrieval systems, demonstrating its robustness and adaptability to a larger number of use cases.
arXiv Detail & Related papers (2024-01-23T11:25:34Z) - DIVKNOWQA: Assessing the Reasoning Ability of LLMs via Open-Domain
Question Answering over Knowledge Base and Text [73.68051228972024]
Large Language Models (LLMs) have exhibited impressive generation capabilities, but they suffer from hallucinations when relying on their internal knowledge.
Retrieval-augmented LLMs have emerged as a potential solution to ground LLMs in external knowledge.
arXiv Detail & Related papers (2023-10-31T04:37:57Z) - An In-Context Schema Understanding Method for Knowledge Base Question
Answering [70.87993081445127]
Large Language Models (LLMs) have shown strong capabilities in language understanding and can be used to solve this task.
Existing methods bypass this challenge by initially employing LLMs to generate drafts of logic forms without schema-specific details.
We propose a simple In-Context Understanding (ICSU) method that enables LLMs to directly understand schemas by leveraging in-context learning.
arXiv Detail & Related papers (2023-10-22T04:19:17Z) - Enhancing In-Context Learning with Answer Feedback for Multi-Span
Question Answering [9.158919909909146]
In this paper, we propose a novel way of employing labeled data such as it informs LLM of some undesired output.
Experiments on three multi-span question answering datasets and a keyphrase extraction dataset show that our new prompting strategy consistently improves LLM's in-context learning performance.
arXiv Detail & Related papers (2023-06-07T15:20:24Z) - Exploiting Abstract Meaning Representation for Open-Domain Question
Answering [18.027908933572203]
We utilize Abstract Meaning Representation (AMR) graphs to assist the model in understanding complex semantic information.
Results from Natural Questions (NQ) and TriviaQA (TQ) demonstrate that our GST method can significantly improve performance.
arXiv Detail & Related papers (2023-05-26T16:00:16Z) - UniKGQA: Unified Retrieval and Reasoning for Solving Multi-hop Question
Answering Over Knowledge Graph [89.98762327725112]
Multi-hop Question Answering over Knowledge Graph(KGQA) aims to find the answer entities that are multiple hops away from the topic entities mentioned in a natural language question.
We propose UniKGQA, a novel approach for multi-hop KGQA task, by unifying retrieval and reasoning in both model architecture and parameter learning.
arXiv Detail & Related papers (2022-12-02T04:08:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.