Utilizing Large Language Models for Natural Interface to Pharmacology
Databases
- URL: http://arxiv.org/abs/2307.15717v1
- Date: Wed, 26 Jul 2023 17:50:11 GMT
- Title: Utilizing Large Language Models for Natural Interface to Pharmacology
Databases
- Authors: Hong Lu, Chuan Li, Yinheng Li, Jie Zhao
- Abstract summary: We introduce a Large Language Model (LLM)-based Natural Language Interface to interact with structured information stored in databases.
This framework can generalize to query a wide range of pharmaceutical data and knowledge bases.
- Score: 7.32741812808506
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The drug development process necessitates that pharmacologists undertake
various tasks, such as reviewing literature, formulating hypotheses, designing
experiments, and interpreting results. Each stage requires accessing and
querying vast amounts of information. In this abstract, we introduce a Large
Language Model (LLM)-based Natural Language Interface designed to interact with
structured information stored in databases. Our experiments demonstrate the
feasibility and effectiveness of the proposed framework. This framework can
generalize to query a wide range of pharmaceutical data and knowledge bases.
Related papers
- Biomedical Foundation Model: A Survey [84.26268124754792]
Foundation models are large-scale pre-trained models that learn from extensive unlabeled datasets.
These models can be adapted to various applications such as question answering and visual understanding.
This survey explores the potential of foundation models across diverse domains within biomedical fields.
arXiv Detail & Related papers (2025-03-03T22:42:00Z) - Knowledge Hierarchy Guided Biological-Medical Dataset Distillation for Domain LLM Training [10.701353329227722]
We propose a framework that automates the distillation of high-quality textual training data from the extensive scientific literature.
Our approach self-evaluates and generates questions that are more closely aligned with the biomedical domain.
Our approach substantially improves question-answering tasks compared to pre-trained models from the life sciences domain.
arXiv Detail & Related papers (2025-01-25T07:20:44Z) - Y-Mol: A Multiscale Biomedical Knowledge-Guided Large Language Model for Drug Development [24.5979645373074]
Y-Mol is a knowledge-guided LLM designed to accomplish tasks across lead compound discovery, pre-clinic, and clinic prediction.
It learns from a corpus of publications, knowledge graphs, and expert-designed synthetic data.
Y-Mol significantly outperforms general-purpose LLMs in discovering lead compounds, predicting molecular properties, and identifying drug interaction events.
arXiv Detail & Related papers (2024-10-15T12:39:20Z) - Diagnostic Reasoning in Natural Language: Computational Model and Application [68.47402386668846]
We investigate diagnostic abductive reasoning (DAR) in the context of language-grounded tasks (NL-DAR)
We propose a novel modeling framework for NL-DAR based on Pearl's structural causal models.
We use the resulting dataset to investigate the human decision-making process in NL-DAR.
arXiv Detail & Related papers (2024-09-09T06:55:37Z) - Medical Vision-Language Pre-Training for Brain Abnormalities [96.1408455065347]
We show how to automatically collect medical image-text aligned data for pretraining from public resources such as PubMed.
In particular, we present a pipeline that streamlines the pre-training process by initially collecting a large brain image-text dataset.
We also investigate the unique challenge of mapping subfigures to subcaptions in the medical domain.
arXiv Detail & Related papers (2024-04-27T05:03:42Z) - Exploring the Effectiveness of Instruction Tuning in Biomedical Language
Processing [19.41164870575055]
This study investigates the potential of instruction tuning for biomedical language processing.
We present a comprehensive, instruction-based model trained on a dataset that consists of approximately $200,000$ instruction-focused samples.
arXiv Detail & Related papers (2023-12-31T20:02:10Z) - Diversifying Knowledge Enhancement of Biomedical Language Models using
Adapter Modules and Knowledge Graphs [54.223394825528665]
We develop an approach that uses lightweight adapter modules to inject structured biomedical knowledge into pre-trained language models.
We use two large KGs, the biomedical knowledge system UMLS and the novel biochemical OntoChem, with two prominent biomedical PLMs, PubMedBERT and BioLinkBERT.
We show that our methodology leads to performance improvements in several instances while keeping requirements in computing power low.
arXiv Detail & Related papers (2023-12-21T14:26:57Z) - High-throughput Biomedical Relation Extraction for Semi-Structured Web Articles Empowered by Large Language Models [1.9665865095034865]
We formulate the relation extraction task as binary classifications for large language models.
We designate the main title as the tail entity and explicitly incorporate it into the context.
Longer contents are sliced into text chunks, embedded, and retrieved with additional embedding models.
arXiv Detail & Related papers (2023-12-13T16:43:41Z) - Interpretable Medical Diagnostics with Structured Data Extraction by
Large Language Models [59.89454513692417]
Tabular data is often hidden in text, particularly in medical diagnostic reports.
We propose a novel, simple, and effective methodology for extracting structured tabular data from textual medical reports, called TEMED-LLM.
We demonstrate that our approach significantly outperforms state-of-the-art text classification models in medical diagnostics.
arXiv Detail & Related papers (2023-06-08T09:12:28Z) - PMC-LLaMA: Towards Building Open-source Language Models for Medicine [62.39105735933138]
Large Language Models (LLMs) have showcased remarkable capabilities in natural language understanding.
LLMs struggle in domains that require precision, such as medical applications, due to their lack of domain-specific knowledge.
We describe the procedure for building a powerful, open-source language model specifically designed for medicine applications, termed as PMC-LLaMA.
arXiv Detail & Related papers (2023-04-27T18:29:05Z) - Development and validation of a natural language processing algorithm to
pseudonymize documents in the context of a clinical data warehouse [53.797797404164946]
The study highlights the difficulties faced in sharing tools and resources in this domain.
We annotated a corpus of clinical documents according to 12 types of identifying entities.
We build a hybrid system, merging the results of a deep learning model as well as manual rules.
arXiv Detail & Related papers (2023-03-23T17:17:46Z) - BigBIO: A Framework for Data-Centric Biomedical Natural Language
Processing [13.30221348538759]
We introduce BigBIO, a community library of 126+ biomedical NLP datasets.
BigBIO facilitates reproducible meta-dataset curation via programmatic access to datasets and their metadata.
We discuss our process for task schema, data auditing, contribution guidelines, and outline two illustrative use cases.
arXiv Detail & Related papers (2022-06-30T07:15:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.