OLAF: An Open Life Science Analysis Framework for Conversational Bioinformatics Powered by Large Language Models
- URL: http://arxiv.org/abs/2504.03976v2
- Date: Thu, 10 Apr 2025 19:32:47 GMT
- Title: OLAF: An Open Life Science Analysis Framework for Conversational Bioinformatics Powered by Large Language Models
- Authors: Dylan Riffle, Nima Shirooni, Cody He, Manush Murali, Sovit Nayak, Rishikumar Gopalan, Diego Gonzalez Lopez,
- Abstract summary: OLAF (Open Life Science Analysis Framework) is an open-source platform that enables researchers to perform bioinformatics analyses using natural language.<n>By combining large language models (LLMs) with a modular agent-pipe-router architecture, OLAF generates and executes bioinformatics code on real scientific data.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: OLAF (Open Life Science Analysis Framework) is an open-source platform that enables researchers to perform bioinformatics analyses using natural language. By combining large language models (LLMs) with a modular agent-pipe-router architecture, OLAF generates and executes bioinformatics code on real scientific data, including formats like .h5ad. The system includes an Angular front end and a Python/Firebase backend, allowing users to run analyses such as single-cell RNA-seq workflows, gene annotation, and data visualization through a simple web interface. Unlike general-purpose AI tools, OLAF integrates code execution, data handling, and scientific libraries in a reproducible, user-friendly environment. It is designed to lower the barrier to computational biology for non-programmers and support transparent, AI-powered life science research.
Related papers
- A Large-Scale Vision-Language Dataset Derived from Open Scientific Literature to Advance Biomedical Generalist AI [70.06771291117965]
We introduce Biomedica, an open-source dataset derived from the PubMed Central Open Access subset.
Biomedica contains over 6 million scientific articles and 24 million image-text pairs.
We provide scalable streaming and search APIs through a web server, facilitating seamless integration with AI systems.
arXiv Detail & Related papers (2025-03-26T05:56:46Z) - Language Model Powered Digital Biology with BRAD [5.309032614374711]
Large Language Models (LLMs) are well-suited for unstructured integration.<n>We present a prototype Bioinformatics Retrieval Augmented Digital assistant (BRAD)
arXiv Detail & Related papers (2024-09-04T16:43:14Z) - SeqMate: A Novel Large Language Model Pipeline for Automating RNA Sequencing [0.0]
SeqMate is a tool that allows for one-click analytics by utilizing the power of a large language model (LLM) to automate both data preparation and analysis.
By utilizing the power of generative AI, SeqMate is also capable of analyzing such findings and producing written reports of upregulated/downregulated/user-prompted genes.
arXiv Detail & Related papers (2024-07-02T20:28:30Z) - An Evaluation of Large Language Models in Bioinformatics Research [52.100233156012756]
We study the performance of large language models (LLMs) on a wide spectrum of crucial bioinformatics tasks.
These tasks include the identification of potential coding regions, extraction of named entities for genes and proteins, detection of antimicrobial and anti-cancer peptides, molecular optimization, and resolution of educational bioinformatics problems.
Our findings indicate that, given appropriate prompts, LLMs like GPT variants can successfully handle most of these tasks.
arXiv Detail & Related papers (2024-02-21T11:27:31Z) - Advancing bioinformatics with large language models: components, applications and perspectives [12.728981464533918]
Large language models (LLMs) are a class of artificial intelligence models based on deep learning.
We will provide a comprehensive overview of the essential components of large language models (LLMs) in bioinformatics.
Key aspects covered include tokenization methods for diverse data types, the architecture of transformer models, and the core attention mechanism.
arXiv Detail & Related papers (2024-01-08T17:26:59Z) - Large Language Models for Generative Information Extraction: A Survey [89.71273968283616]
Large Language Models (LLMs) have demonstrated remarkable capabilities in text understanding and generation.
We present an extensive overview by categorizing these works in terms of various IE subtasks and techniques.
We empirically analyze the most advanced methods and discover the emerging trend of IE tasks with LLMs.
arXiv Detail & Related papers (2023-12-29T14:25:22Z) - GenoCraft: A Comprehensive, User-Friendly Web-Based Platform for High-Throughput Omics Data Analysis and Visualization [23.53674358126236]
GenoCraft is a web-based comprehensive software solution designed to handle the entire pipeline of omics data processing.
GenoCraft offers a unified platform featuring advanced bioinformatics tools, covering all aspects of omics data analysis.
arXiv Detail & Related papers (2023-12-21T19:06:34Z) - Diversifying Knowledge Enhancement of Biomedical Language Models using
Adapter Modules and Knowledge Graphs [54.223394825528665]
We develop an approach that uses lightweight adapter modules to inject structured biomedical knowledge into pre-trained language models.
We use two large KGs, the biomedical knowledge system UMLS and the novel biochemical OntoChem, with two prominent biomedical PLMs, PubMedBERT and BioLinkBERT.
We show that our methodology leads to performance improvements in several instances while keeping requirements in computing power low.
arXiv Detail & Related papers (2023-12-21T14:26:57Z) - SelfEEG: A Python library for Self-Supervised Learning in
Electroencephalography [0.0]
SelfEEG is an open-source Python library developed to assist researchers in conducting Self-Supervised Learning (SSL) experiments on electroencephalography (EEG) data.
Its primary objective is to offer a user-friendly but highly customizable environment, enabling users to efficiently design and execute self-supervised learning tasks on EEG data.
arXiv Detail & Related papers (2023-12-20T14:58:07Z) - DLSIA: Deep Learning for Scientific Image Analysis [45.81637398863868]
DLSIA is a Python-based machine learning library that empowers scientists and researchers across diverse scientific domains with a range of customizable convolutional neural network (CNN) architectures.
DLSIA features easy-to-use architectures such as autoencoders, tunable U-Nets, and parameter-lean mixed-scale dense networks (MSDNets)
arXiv Detail & Related papers (2023-08-02T21:32:41Z) - GAIA Search: Hugging Face and Pyserini Interoperability for NLP Training
Data Exploration [97.68234051078997]
We discuss how Pyserini can be integrated with the Hugging Face ecosystem of open-source AI libraries and artifacts.
We include a Jupyter Notebook-based walk through the core interoperability features, available on GitHub.
We present GAIA Search - a search engine built following previously laid out principles, giving access to four popular large-scale text collections.
arXiv Detail & Related papers (2023-06-02T12:09:59Z) - GenNI: Human-AI Collaboration for Data-Backed Text Generation [102.08127062293111]
Table2Text systems generate textual output based on structured data utilizing machine learning.
GenNI (Generation Negotiation Interface) is an interactive visual system for high-level human-AI collaboration in producing descriptive text.
arXiv Detail & Related papers (2021-10-19T18:07:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.