Language Model Powered Digital Biology with BRAD
- URL: http://arxiv.org/abs/2409.02864v3
- Date: Sun, 08 Dec 2024 15:45:30 GMT
- Title: Language Model Powered Digital Biology with BRAD
- Authors: Joshua Pickard, Ram Prakash, Marc Andrew Choi, Natalie Oliven, Cooper Stansbury, Jillian Cwycyshyn, Alex Gorodetsky, Alvaro Velasquez, Indika Rajapakse,
- Abstract summary: Large Language Models (LLMs) are well-suited for unstructured integration.
We present a prototype Bioinformatics Retrieval Augmented Digital assistant (BRAD)
- Score: 5.309032614374711
- License:
- Abstract: Recent advancements in Large Language Models (LLMs) are transforming biology, computer science, engineering, and every day life. However, integrating the wide array of computational tools, databases, and scientific literature continues to pose a challenge to biological research. LLMs are well-suited for unstructured integration, efficient information retrieval, and automating standard workflows and actions from these diverse resources. To harness these capabilities in bioinformatics, we present a prototype Bioinformatics Retrieval Augmented Digital assistant (BRAD). BRAD is a chatbot and agentic system that integrates a variety of bioinformatics tools. The Python package implements an AI \texttt{Agent} that is powered by LLMs and connects to a local file system, online databases, and a user's software. The \texttt{Agent} is highly configurable, enabling tasks such as Retrieval-Augmented Generation, searches across bioinformatics databases, and the execution of software pipelines. BRAD's coordinated integration of bioinformatics tools delivers a context-aware and semi-autonomous system that extends beyond the capabilities of conventional LLM-based chatbots. A graphical user interface (GUI) provides an intuitive interface to the system.
Related papers
- BioAgents: Democratizing Bioinformatics Analysis with Multi-Agent Systems [6.668992155393883]
We propose a multi-agent system built on small language models, fine-tuned on bioinformatics data, and enhanced with retrieval augmented generation (RAG)
Our system, BioAgents, enables local operation and personalization using proprietary data.
We observe performance comparable to human experts on conceptual genomics tasks, and suggest next steps to enhance code generation capabilities.
arXiv Detail & Related papers (2025-01-10T19:30:59Z) - OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis [55.390060529534644]
We propose OS-Genesis, a novel data synthesis pipeline for Graphical User Interface (GUI) agents.
Instead of relying on pre-defined tasks, OS-Genesis enables agents first to perceive environments and perform step-wise interactions.
A trajectory reward model is then employed to ensure the quality of the generated trajectories.
arXiv Detail & Related papers (2024-12-27T16:21:58Z) - Large Language Model-Brained GUI Agents: A Survey [42.82362907348966]
multimodal models have ushered in a new era of GUI automation.
They have demonstrated exceptional capabilities in natural language understanding, code generation, and visual processing.
These agents represent a paradigm shift, enabling users to perform intricate, multi-step tasks through simple conversational commands.
arXiv Detail & Related papers (2024-11-27T12:13:39Z) - Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows? [73.81908518992161]
We introduce Spider2-V, the first multimodal agent benchmark focusing on professional data science and engineering.
Spider2-V features real-world tasks in authentic computer environments and incorporating 20 enterprise-level professional applications.
These tasks evaluate the ability of a multimodal agent to perform data-related tasks by writing code and managing the GUI in enterprise data software systems.
arXiv Detail & Related papers (2024-07-15T17:54:37Z) - LAB-Bench: Measuring Capabilities of Language Models for Biology Research [1.6312096924271486]
We introduce the Language Agent Biology Benchmark (LAB-Bench)
It is a dataset of over 2,400 multiple choice questions for evaluating AI systems on a range of practical biology research capabilities.
We measure performance of several frontier language models against our benchmark and report results compared to human expert biology researchers.
arXiv Detail & Related papers (2024-07-14T23:52:25Z) - Generative AI Systems: A Systems-based Perspective on Generative AI [12.400966570867322]
Large Language Models (LLMs) have revolutionized AI systems by enabling communication with machines using natural language.
Recent developments in Generative AI (GenAI) have shown great promise in using LLMs as multimodal systems.
This paper aims to explore and state new research directions in Generative AI Systems.
arXiv Detail & Related papers (2024-06-25T12:51:47Z) - EndToEndML: An Open-Source End-to-End Pipeline for Machine Learning Applications [0.2826977330147589]
We propose a web-based end-to-end pipeline that is capable of preprocessing, training, evaluating, and visualizing machine learning models.
Our library assists in recognizing, classifying, clustering, and predicting a wide range of multi-modal, multi-sensor datasets.
arXiv Detail & Related papers (2024-03-27T02:24:38Z) - An Evaluation of Large Language Models in Bioinformatics Research [52.100233156012756]
We study the performance of large language models (LLMs) on a wide spectrum of crucial bioinformatics tasks.
These tasks include the identification of potential coding regions, extraction of named entities for genes and proteins, detection of antimicrobial and anti-cancer peptides, molecular optimization, and resolution of educational bioinformatics problems.
Our findings indicate that, given appropriate prompts, LLMs like GPT variants can successfully handle most of these tasks.
arXiv Detail & Related papers (2024-02-21T11:27:31Z) - Diversifying Knowledge Enhancement of Biomedical Language Models using
Adapter Modules and Knowledge Graphs [54.223394825528665]
We develop an approach that uses lightweight adapter modules to inject structured biomedical knowledge into pre-trained language models.
We use two large KGs, the biomedical knowledge system UMLS and the novel biochemical OntoChem, with two prominent biomedical PLMs, PubMedBERT and BioLinkBERT.
We show that our methodology leads to performance improvements in several instances while keeping requirements in computing power low.
arXiv Detail & Related papers (2023-12-21T14:26:57Z) - GenNI: Human-AI Collaboration for Data-Backed Text Generation [102.08127062293111]
Table2Text systems generate textual output based on structured data utilizing machine learning.
GenNI (Generation Negotiation Interface) is an interactive visual system for high-level human-AI collaboration in producing descriptive text.
arXiv Detail & Related papers (2021-10-19T18:07:07Z) - EBIC.JL -- an Efficient Implementation of Evolutionary Biclustering
Algorithm in Julia [59.422301529692454]
We introduce EBIC.JL - an implementation of one of the most accurate biclustering algorithms in Julia.
We show that the new version maintains comparable accuracy to its predecessor EBIC while converging faster for the majority of the problems.
arXiv Detail & Related papers (2021-05-03T22:30:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.