A GenAI System for Improved FAIR Independent Biological Database Integration
- URL: http://arxiv.org/abs/2506.17934v1
- Date: Sun, 22 Jun 2025 08:04:24 GMT
- Title: A GenAI System for Improved FAIR Independent Biological Database Integration
- Authors: Syed N. Sakib, Kallol Naha, Sajratul Y. Rubaiat, Hasan M. Jamil,
- Abstract summary: We introduce an experimental natural language-based query processing system designed to empower scientists to discover, access, and query biological databases.<n> FAIRBridge harnesses the capabilities of AI to interpret query intents, map them to relevant databases, and generate executable queries.<n>The system also includes robust tools for mitigating low-quality query processing, ensuring high fidelity and responsiveness in the information delivered.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Life sciences research increasingly requires identifying, accessing, and effectively processing data from an ever-evolving array of information sources on the Linked Open Data (LOD) network. This dynamic landscape places a significant burden on researchers, as the quality of query responses depends heavily on the selection and semantic integration of data sources --processes that are often labor-intensive, error-prone, and costly. While the adoption of FAIR (Findable, Accessible, Interoperable, and Reusable) data principles has aimed to address these challenges, barriers to efficient and accurate scientific data processing persist. In this paper, we introduce FAIRBridge, an experimental natural language-based query processing system designed to empower scientists to discover, access, and query biological databases, even when they are not FAIR-compliant. FAIRBridge harnesses the capabilities of AI to interpret query intents, map them to relevant databases described in scientific literature, and generate executable queries via intelligent resource access plans. The system also includes robust tools for mitigating low-quality query processing, ensuring high fidelity and responsiveness in the information delivered. FAIRBridge's autonomous query processing framework enables users to explore alternative data sources, make informed choices at every step, and leverage community-driven crowd curation when needed. By providing a user-friendly, automated hypothesis-testing platform in natural English, FAIRBridge significantly enhances the integration and processing of scientific data, offering researchers a powerful new tool for advancing their inquiries.
Related papers
- From Web Search towards Agentic Deep Research: Incentivizing Search with Reasoning Agents [96.65646344634524]
Large Language Models (LLMs), endowed with reasoning and agentic capabilities, are ushering in a new paradigm termed Agentic Deep Research.<n>We trace the evolution from static web search to interactive, agent-based systems that plan, explore, and learn.<n>We demonstrate that Agentic Deep Research not only significantly outperforms existing approaches, but is also poised to become the dominant paradigm for future information seeking.
arXiv Detail & Related papers (2025-06-23T17:27:19Z) - ScIRGen: Synthesize Realistic and Large-Scale RAG Dataset for Scientific Research [15.983924435685553]
We develop ScIRGen, a dataset generation framework for scientific QA & retrieval.<n>We use it to create a large-scale scientific retrieval-augmented generation (RAG) dataset with realistic queries, datasets and papers.
arXiv Detail & Related papers (2025-06-09T11:47:13Z) - InfoDeepSeek: Benchmarking Agentic Information Seeking for Retrieval-Augmented Generation [63.55258191625131]
InfoDeepSeek is a new benchmark for assessing agentic information seeking in real-world, dynamic web environments.<n>We propose a systematic methodology for constructing challenging queries satisfying the criteria of determinacy, difficulty, and diversity.<n>We develop the first evaluation framework tailored to dynamic agentic information seeking, including fine-grained metrics about the accuracy, utility, and compactness of information seeking outcomes.
arXiv Detail & Related papers (2025-05-21T14:44:40Z) - Toward Agentic AI: Generative Information Retrieval Inspired Intelligent Communications and Networking [87.82985288731489]
Agentic AI has emerged as a key paradigm for intelligent communications and networking.<n>This article emphasizes the role of knowledge acquisition, processing, and retrieval in agentic AI for telecom systems.
arXiv Detail & Related papers (2025-02-24T06:02:25Z) - iTRI-QA: a Toolset for Customized Question-Answer Dataset Generation Using Language Models for Enhanced Scientific Research [1.2411445143550854]
We present a tool for the development of a customized question-answer (QA) dataset, called Interactive Trained Research Innovator (iTRI) - QA.<n>Our approach integrates curated QA datasets with a specialized research paper dataset to enhance responses' contextual relevance and accuracy using fine-tuned LM.<n>This pipeline provides a dynamic and domain-specific QA system that will be applied for future research LM deployment.
arXiv Detail & Related papers (2025-01-27T23:38:39Z) - Agentic AI-Driven Technical Troubleshooting for Enterprise Systems: A Novel Weighted Retrieval-Augmented Generation Paradigm [0.0]
This paper presents a novel agentic AI solution built on a Weighted Retrieval-Augmented Generation (RAG) Framework tailored for enterprise technical troubleshooting.<n>By dynamically weighting retrieval sources such as product manuals, internal knowledge bases, FAQ, and troubleshooting guides, the framework prioritizes the most relevant data.<n>Preliminary evaluations on large enterprise datasets demonstrate the framework's efficacy in improving troubleshooting accuracy, reducing resolution times, and adapting to varied technical challenges.
arXiv Detail & Related papers (2024-12-16T17:32:38Z) - Synthetic Data Generation with Large Language Models for Personalized Community Question Answering [47.300506002171275]
We build Sy-SE-PQA based on an existing dataset, SE-PQA, which consists of questions and answers posted on the popular StackExchange communities.
Our findings suggest that LLMs have high potential in generating data tailored to users' needs.
The synthetic data can replace human-written training data, even if the generated data may contain incorrect information.
arXiv Detail & Related papers (2024-10-29T16:19:08Z) - Enhancing Knowledge Retrieval with In-Context Learning and Semantic Search through Generative AI [3.9773527114058855]
We propose a novel methodology that combines the generative capabilities of Large Language Models with the fast and accurate retrieval capabilities of vector databases.
The developed model, Generative Text Retrieval (GTR), is adaptable to both unstructured and structured data with minor refinement.
The refined model, Generative Tabular Text Retrieval (GTR-T), demonstrated its efficiency in large database querying.
arXiv Detail & Related papers (2024-06-13T23:08:06Z) - Large Language Models for Information Retrieval: A Survey [58.30439850203101]
Information retrieval has evolved from term-based methods to its integration with advanced neural models.
Recent research has sought to leverage large language models (LLMs) to improve IR systems.
We delve into the confluence of LLMs and IR systems, including crucial aspects such as query rewriters, retrievers, rerankers, and readers.
arXiv Detail & Related papers (2023-08-14T12:47:22Z) - Human-Centric Multimodal Machine Learning: Recent Advances and Testbed
on AI-based Recruitment [66.91538273487379]
There is a certain consensus about the need to develop AI applications with a Human-Centric approach.
Human-Centric Machine Learning needs to be developed based on four main requirements: (i) utility and social good; (ii) privacy and data ownership; (iii) transparency and accountability; and (iv) fairness in AI-driven decision-making processes.
We study how current multimodal algorithms based on heterogeneous sources of information are affected by sensitive elements and inner biases in the data.
arXiv Detail & Related papers (2023-02-13T16:44:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.