GeneGPT: Augmenting Large Language Models with Domain Tools for Improved
Access to Biomedical Information
- URL: http://arxiv.org/abs/2304.09667v3
- Date: Tue, 16 May 2023 13:24:53 GMT
- Title: GeneGPT: Augmenting Large Language Models with Domain Tools for Improved
Access to Biomedical Information
- Authors: Qiao Jin, Yifan Yang, Qingyu Chen, Zhiyong Lu
- Abstract summary: We present GeneGPT, a novel method for teaching LLMs to use the Web APIs of the National Center for Biotechnology Information.
We prompt Codex to solve the GeneTuring tests with NCBI Web APIs by in-context learning and an augmented decoding algorithm.
GeneGPT achieves state-of-the-art performance on eight tasks in the GeneTuring benchmark with an average score of 0.83.
- Score: 18.551792817140473
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While large language models (LLMs) have been successfully applied to various
tasks, they still face challenges with hallucinations. Augmenting LLMs with
domain-specific tools such as database utilities can facilitate easier and more
precise access to specialized knowledge. In this paper, we present GeneGPT, a
novel method for teaching LLMs to use the Web APIs of the National Center for
Biotechnology Information (NCBI) for answering genomics questions.
Specifically, we prompt Codex to solve the GeneTuring tests with NCBI Web APIs
by in-context learning and an augmented decoding algorithm that can detect and
execute API calls. Experimental results show that GeneGPT achieves
state-of-the-art performance on eight tasks in the GeneTuring benchmark with an
average score of 0.83, largely surpassing retrieval-augmented LLMs such as the
new Bing (0.44), biomedical LLMs such as BioMedLM (0.08) and BioGPT (0.04), as
well as GPT-3 (0.16) and ChatGPT (0.12). Our further analyses suggest that: (1)
API demonstrations have good cross-task generalizability and are more useful
than documentations for in-context learning; (2) GeneGPT can generalize to
longer chains of API calls and answer multi-hop questions in GeneHop, a novel
dataset introduced in this work; (3) Different types of errors are enriched in
different tasks, providing valuable insights for future improvements.
Related papers
- GeneAgent: Self-verification Language Agent for Gene Set Knowledge Discovery using Domain Databases [5.831842925038342]
We present GeneAgent, a first-of-its-kind language agent featuring self-verification capability.
It autonomously interacts with various biological databases to improve accuracy and reduce hallucination occurrences.
Benchmarking on 1,106 gene sets from different sources, GeneAgent consistently outperforms standard GPT-4 by a significant margin.
arXiv Detail & Related papers (2024-05-25T12:35:15Z) - BiomedRAG: A Retrieval Augmented Large Language Model for Biomedicine [19.861178160437827]
Large Language Models (LLMs) have swiftly emerged as vital resources for different applications in the biomedical and healthcare domains.
textscBiomedRAG attains superior performance across 5 biomedical NLP tasks.
textscBiomedRAG outperforms other triple extraction systems with micro-F1 scores of 81.42 and 88.83 on GIT and ChemProt corpora, respectively.
arXiv Detail & Related papers (2024-05-01T12:01:39Z) - An Evaluation of Large Language Models in Bioinformatics Research [52.100233156012756]
We study the performance of large language models (LLMs) on a wide spectrum of crucial bioinformatics tasks.
These tasks include the identification of potential coding regions, extraction of named entities for genes and proteins, detection of antimicrobial and anti-cancer peptides, molecular optimization, and resolution of educational bioinformatics problems.
Our findings indicate that, given appropriate prompts, LLMs like GPT variants can successfully handle most of these tasks.
arXiv Detail & Related papers (2024-02-21T11:27:31Z) - Diversifying Knowledge Enhancement of Biomedical Language Models using
Adapter Modules and Knowledge Graphs [54.223394825528665]
We develop an approach that uses lightweight adapter modules to inject structured biomedical knowledge into pre-trained language models.
We use two large KGs, the biomedical knowledge system UMLS and the novel biochemical OntoChem, with two prominent biomedical PLMs, PubMedBERT and BioLinkBERT.
We show that our methodology leads to performance improvements in several instances while keeping requirements in computing power low.
arXiv Detail & Related papers (2023-12-21T14:26:57Z) - BioInstruct: Instruction Tuning of Large Language Models for Biomedical Natural Language Processing [10.698756010878688]
We created the BioInstruct, comprising 25,005 instructions to instruction-tune large language models (LLMs)
The instructions were created by prompting the GPT-4 language model with three-seed samples randomly drawn from an 80 human curated instructions.
We evaluated these instruction-tuned LLMs on several BioNLP tasks, which can be grouped into three major categories: question answering(QA), information extraction(IE), and text generation(GEN)
arXiv Detail & Related papers (2023-10-30T19:38:50Z) - Text2KGBench: A Benchmark for Ontology-Driven Knowledge Graph Generation
from Text [2.396908230113859]
Large language models (LLM) and foundation models with emergent capabilities have been shown to improve the performance of many NLP tasks.
We present Text2KGBench, a benchmark to evaluate the capabilities of language models to generate Knowledge Graphs (KGs) from natural language text guided by an ontology.
arXiv Detail & Related papers (2023-08-04T14:47:15Z) - API-Bank: A Comprehensive Benchmark for Tool-Augmented LLMs [84.45284695156771]
API-Bank is a groundbreaking benchmark for tool-augmented Large Language Models.
We develop a run evaluation system consisting of 73 API tools.
We construct a comprehensive training set containing 1,888 tool-use dialogues from 2,138 APIs spanning 1,000 distinct domains.
arXiv Detail & Related papers (2023-04-14T14:05:32Z) - Explaining Patterns in Data with Language Models via Interpretable
Autoprompting [143.4162028260874]
We introduce interpretable autoprompting (iPrompt), an algorithm that generates a natural-language string explaining the data.
iPrompt can yield meaningful insights by accurately finding groundtruth dataset descriptions.
Experiments with an fMRI dataset show the potential for iPrompt to aid in scientific discovery.
arXiv Detail & Related papers (2022-10-04T18:32:14Z) - BigBIO: A Framework for Data-Centric Biomedical Natural Language
Processing [13.30221348538759]
We introduce BigBIO, a community library of 126+ biomedical NLP datasets.
BigBIO facilitates reproducible meta-dataset curation via programmatic access to datasets and their metadata.
We discuss our process for task schema, data auditing, contribution guidelines, and outline two illustrative use cases.
arXiv Detail & Related papers (2022-06-30T07:15:45Z) - Scientific Language Models for Biomedical Knowledge Base Completion: An
Empirical Study [62.376800537374024]
We study scientific LMs for KG completion, exploring whether we can tap into their latent knowledge to enhance biomedical link prediction.
We integrate the LM-based models with KG embedding models, using a router method that learns to assign each input example to either type of model and provides a substantial boost in performance.
arXiv Detail & Related papers (2021-06-17T17:55:33Z) - EBIC.JL -- an Efficient Implementation of Evolutionary Biclustering
Algorithm in Julia [59.422301529692454]
We introduce EBIC.JL - an implementation of one of the most accurate biclustering algorithms in Julia.
We show that the new version maintains comparable accuracy to its predecessor EBIC while converging faster for the majority of the problems.
arXiv Detail & Related papers (2021-05-03T22:30:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.