Related papers: A large collection of bioinformatics question-query pairs over federated knowledge graphs: methodology and applications

A large collection of bioinformatics question-query pairs over federated knowledge graphs: methodology and applications

URL: http://arxiv.org/abs/2410.06010v1
Date: Tue, 8 Oct 2024 13:08:07 GMT
Title: A large collection of bioinformatics question-query pairs over federated knowledge graphs: methodology and applications
Authors: Jerven Bolleman, Vincent Emonet, Adrian Altenhoff, Amos Bairoch, Marie-Claude Blatter, Alan Bridge, Severine Duvaud, Elisabeth Gasteiger, Dmitry Kuznetsov, Sebastien Moretti, Pierre-Andre Michel, Anne Morgat, Marco Pagni, Nicole Redaschi, Monique Zahn-Zabal, Tarcisio Mendes de Farias, Ana Claudia Sima,
Abstract summary: We introduce a large collection of human-written natural language questions and their corresponding SPARQL queries over federated bioinformatics knowledge graphs. We propose a methodology to uniformly represent the examples with minimal metadata, based on existing standards.
Score: 0.0838491111002084
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Background. In the last decades, several life science resources have structured data using the same framework and made these accessible using the same query language to facilitate interoperability. Knowledge graphs have seen increased adoption in bioinformatics due to their advantages for representing data in a generic graph format. For example, yummydata.org catalogs more than 60 knowledge graphs accessible through SPARQL, a technical query language. Although SPARQL allows powerful, expressive queries, even across physically distributed knowledge graphs, formulating such queries is a challenge for most users. Therefore, to guide users in retrieving the relevant data, many of these resources provide representative examples. These examples can also be an important source of information for machine learning, if a sufficiently large number of examples are provided and published in a common, machine-readable and standardized format across different resources. Findings. We introduce a large collection of human-written natural language questions and their corresponding SPARQL queries over federated bioinformatics knowledge graphs (KGs) collected for several years across different research groups at the SIB Swiss Institute of Bioinformatics. The collection comprises more than 1000 example questions and queries, including 65 federated queries. We propose a methodology to uniformly represent the examples with minimal metadata, based on existing standards. Furthermore, we introduce an extensive set of open-source applications, including query graph visualizations and smart query editors, easily reusable by KG maintainers who adopt the proposed methodology. Conclusions. We encourage the community to adopt and extend the proposed methodology, towards richer KG metadata and improved Semantic Web services.

Related papers

Scaling Beyond Context: A Survey of Multimodal Retrieval-Augmented Generation for Document Understanding [61.36285696607487]
Document understanding is critical for applications from financial analysis to scientific discovery.<n>Current approaches, whether OCR-based pipelines feeding Large Language Models (LLMs) or native Multimodal LLMs (MLLMs) face key limitations.<n>Retrieval-Augmented Generation (RAG) helps ground models in external data, but documents' multimodal nature, combining text, tables, charts, and layout, demands a more advanced paradigm: Multimodal RAG.
arXiv Detail & Related papers (2025-10-17T02:33:16Z)
Scalable and Explainable Enterprise Knowledge Discovery Using Graph-Centric Hybrid Retrieval [0.0]
Modern enterprises manage vast knowledge distributed across heterogeneous systems such as Jira, Git repositories, Confluence, and wikis.<n>We present a modular hybrid retrieval framework that integrates Knowledge Base Language-Augmented Models (KBLam), DeepGraph representations, and embedding-driven semantic search.<n>The framework builds a unified knowledge graph from parsed repositories including code, pull requests, and commit histories.<n> Experiments on large-scale Git repositories show that the unified reasoning layer improves answer relevance by up to 80 percent compared with standalone GPT-based retrieval pipelines.
arXiv Detail & Related papers (2025-10-13T02:56:36Z)
Text-to-SPARQL Goes Beyond English: Multilingual Question Answering Over Knowledge Graphs through Human-Inspired Reasoning [51.203811759364925]
mKGQAgent breaks down the task of converting natural language questions into SPARQL queries into modular, interpretable subtasks.<n> Evaluated on the DBpedia- and Corporate-based KGQA benchmarks within the Text2SPARQL challenge 2025, our approach took first place among the other participants.
arXiv Detail & Related papers (2025-07-22T19:23:03Z)
GRASP: Generic Reasoning And SPARQL Generation across Knowledge Graphs [4.005483185111992]
We propose a new approach for generating SPARQL queries on RDF knowledge graphs from natural language questions or keyword queries.<n>Our approach does not require fine-tuning. Instead, it uses the language model to explore the knowledge graph by strategically executing SPARQL queries and searching for relevant IRIs and literals.
arXiv Detail & Related papers (2025-07-10T18:50:05Z)
OnSET: Ontology and Semantic Exploration Toolkit [5.1293983340834055]
We propose a Semantic system, Ontology and Exploration Toolkit (OnSET) OnSET allows non-expert users to easily build queries with visual user guidance provided by topic modelling and semantic search. OnSET combines efficient and open platforms to deploy the system on commodity hardware.
arXiv Detail & Related papers (2025-04-11T09:18:06Z)
Optimizing open-domain question answering with graph-based retrieval augmented generation [5.2850605665217865]
We benchmark various graph-based retrieval-augmented generation (RAG) systems across a broad spectrum of query types. Traditional RAG methods often fall short in handling nuanced, multi-document tasks. We introduce TREX, a novel, cost-effective alternative that combines graph-based synthesis and vector-based retrieval techniques.
arXiv Detail & Related papers (2025-03-04T18:47:17Z)
Generative Retrieval for Book search [106.67655212825025]
We propose an effective Generative retrieval framework for Book Search. It features two main components: data augmentation and outline-oriented book encoding. Experiments on a proprietary Baidu dataset demonstrate that GBS outperforms strong baselines.
arXiv Detail & Related papers (2025-01-19T12:57:13Z)
G-RAG: Knowledge Expansion in Material Science [0.0]
Graph RAG integrates graph databases to enhance the retrieval process. We implement an agent-based parsing technique to achieve a more detailed representation of the documents.
arXiv Detail & Related papers (2024-11-21T21:22:58Z)
Knowledge-Aware Query Expansion with Large Language Models for Textual and Relational Retrieval [49.42043077545341]
We propose a knowledge-aware query expansion framework, augmenting LLMs with structured document relations from knowledge graph (KG) We leverage document texts as rich KG node representations and use document-based relation filtering for our Knowledge-Aware Retrieval (KAR)
arXiv Detail & Related papers (2024-10-17T17:03:23Z)
Less is More: Making Smaller Language Models Competent Subgraph Retrievers for Multi-hop KGQA [51.3033125256716]
We model the subgraph retrieval task as a conditional generation task handled by small language models. Our base generative subgraph retrieval model, consisting of only 220M parameters, competitive retrieval performance compared to state-of-the-art models. Our largest 3B model, when plugged with an LLM reader, sets new SOTA end-to-end performance on both the WebQSP and CWQ benchmarks.
arXiv Detail & Related papers (2024-10-08T15:22:36Z)
LLM-based SPARQL Query Generation from Natural Language over Federated Knowledge Graphs [0.0]
We introduce a Retrieval-Augmented Generation (RAG) system for translating user questions into accurate SPARQL queries over bioinformatics knowledge graphs (KGs) To enhance accuracy and reduce hallucinations in query generation, our system utilise metadata from the KGs, including query examples and schema information, and incorporates a validation step to correct generated queries. The system is available online at chat.expasy.org.
arXiv Detail & Related papers (2024-10-08T14:09:12Z)
Database-Augmented Query Representation for Information Retrieval [59.57065228857247]
We present a novel retrieval framework called Database-Augmented Query representation (DAQu) DAQu augments the original query with various (query-related) metadata across multiple tables. We validate DAQu in diverse retrieval scenarios that can incorporate metadata from the relational database.
arXiv Detail & Related papers (2024-06-23T05:02:21Z)
From Local to Global: A Graph RAG Approach to Query-Focused Summarization [4.075260785658849]
GraphRAG is a graph-based approach to question answering over private text corpora. We show that GraphRAG leads to substantial improvements over a conventional RAG baseline for both the comprehensiveness and diversity of generated answers.
arXiv Detail & Related papers (2024-04-24T18:38:11Z)
SPARQL Generation: an analysis on fine-tuning OpenLLaMA for Question Answering over a Life Science Knowledge Graph [0.0]
We evaluate strategies for fine-tuning the OpenLlama LLM for question answering over life science knowledge graphs. We propose an end-to-end data augmentation approach for extending a set of existing queries over a given knowledge graph. We also investigate the role of semantic "clues" in the queries, such as meaningful variable names and inline comments.
arXiv Detail & Related papers (2024-02-07T07:24:01Z)
Query of CC: Unearthing Large Scale Domain-Specific Knowledge from Public Corpora [104.16648246740543]
We propose an efficient data collection method based on large language models. The method bootstraps seed information through a large language model and retrieves related data from public corpora. It not only collects knowledge-related data for specific domains but unearths the data with potential reasoning procedures.
arXiv Detail & Related papers (2024-01-26T03:38:23Z)
Knowledge Graph Question Answering for Materials Science (KGQA4MAT): Developing Natural Language Interface for Metal-Organic Frameworks Knowledge Graph (MOF-KG) Using LLM [35.208135795371795]
We present a benchmark dataset for Knowledge Graph Question Answering in Materials Science (KGQA4MAT) A knowledge graph for metal-organic frameworks (MOF-KG) has been constructed by integrating structured databases and knowledge extracted from the literature. We have developed a benchmark comprised of 161 complex questions involving comparison, aggregation, and complicated graph structures.
arXiv Detail & Related papers (2023-09-20T14:43:43Z)
ALIST: Associative Logic for Inference, Storage and Transfer. A Lingua Franca for Inference on the Web [0.0]
A formalism that abstracts the representation of queries from the specific query language of a knowledge graph. A representation to dynamically curate data and functions (operations) over diverse knowledge sources. A demonstration of the expressiveness of alists to represent the diversity of representational formalisms.
arXiv Detail & Related papers (2023-03-12T15:55:56Z)
UniKGQA: Unified Retrieval and Reasoning for Solving Multi-hop Question Answering Over Knowledge Graph [89.98762327725112]
Multi-hop Question Answering over Knowledge Graph(KGQA) aims to find the answer entities that are multiple hops away from the topic entities mentioned in a natural language question. We propose UniKGQA, a novel approach for multi-hop KGQA task, by unifying retrieval and reasoning in both model architecture and parameter learning.
arXiv Detail & Related papers (2022-12-02T04:08:09Z)
Graph Enhanced BERT for Query Understanding [55.90334539898102]
query understanding plays a key role in exploring users' search intents and facilitating users to locate their most desired information. In recent years, pre-trained language models (PLMs) have advanced various natural language processing tasks. We propose a novel graph-enhanced pre-training framework, GE-BERT, which can leverage both query content and the query graph.
arXiv Detail & Related papers (2022-04-03T16:50:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.