Related papers: Language Model Powered Digital Biology

Related papers

BioMARS: A Multi-Agent Robotic System for Autonomous Biological Experiments [8.317138109309967]
Large language models (LLMs) and vision-language models (VLMs) have the potential to transform biological research by enabling autonomous experimentation.<n>Here we introduce BioMARS, an intelligent platform that integrates LLMs, VLMs, and modular robotics to autonomously design, plan, and execute biological experiments.<n>A web interface enables real-time human-AI collaboration, while a modular backend allows scalable integration with laboratory hardware.
arXiv Detail & Related papers (2025-07-02T08:47:02Z)
OLAF: An Open Life Science Analysis Framework for Conversational Bioinformatics Powered by Large Language Models [0.0]
OLAF (Open Life Science Analysis Framework) is an open-source platform that enables researchers to perform bioinformatics analyses using natural language. By combining large language models (LLMs) with a modular agent-pipe-router architecture, OLAF generates and executes bioinformatics code on real scientific data.
arXiv Detail & Related papers (2025-04-04T22:41:16Z)
A Large-Scale Vision-Language Dataset Derived from Open Scientific Literature to Advance Biomedical Generalist AI [70.06771291117965]
We introduce Biomedica, an open-source dataset derived from the PubMed Central Open Access subset. Biomedica contains over 6 million scientific articles and 24 million image-text pairs. We provide scalable streaming and search APIs through a web server, facilitating seamless integration with AI systems.
arXiv Detail & Related papers (2025-03-26T05:56:46Z)
BioAgents: Democratizing Bioinformatics Analysis with Multi-Agent Systems [6.668992155393883]
We propose a multi-agent system built on small language models, fine-tuned on bioinformatics data, and enhanced with retrieval augmented generation (RAG) Our system, BioAgents, enables local operation and personalization using proprietary data. We observe performance comparable to human experts on conceptual genomics tasks, and suggest next steps to enhance code generation capabilities.
arXiv Detail & Related papers (2025-01-10T19:30:59Z)
OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis [55.390060529534644]
We propose OS-Genesis, a novel data synthesis pipeline for Graphical User Interface (GUI) agents. Instead of relying on pre-defined tasks, OS-Genesis enables agents first to perceive environments and perform step-wise interactions. We demonstrate that training GUI agents with OS-Genesis significantly improves their performance on highly challenging online benchmarks.
arXiv Detail & Related papers (2024-12-27T16:21:58Z)
Large Language Model-Brained GUI Agents: A Survey [42.82362907348966]
multimodal models have ushered in a new era of GUI automation. They have demonstrated exceptional capabilities in natural language understanding, code generation, and visual processing. These agents represent a paradigm shift, enabling users to perform intricate, multi-step tasks through simple conversational commands.
arXiv Detail & Related papers (2024-11-27T12:13:39Z)
Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows? [73.81908518992161]
We introduce Spider2-V, the first multimodal agent benchmark focusing on professional data science and engineering. Spider2-V features real-world tasks in authentic computer environments and incorporating 20 enterprise-level professional applications. These tasks evaluate the ability of a multimodal agent to perform data-related tasks by writing code and managing the GUI in enterprise data software systems.
arXiv Detail & Related papers (2024-07-15T17:54:37Z)
LAB-Bench: Measuring Capabilities of Language Models for Biology Research [1.6312096924271486]
We introduce the Language Agent Biology Benchmark (LAB-Bench) It is a dataset of over 2,400 multiple choice questions for evaluating AI systems on a range of practical biology research capabilities. We measure performance of several frontier language models against our benchmark and report results compared to human expert biology researchers.
arXiv Detail & Related papers (2024-07-14T23:52:25Z)
BioDiscoveryAgent: An AI Agent for Designing Genetic Perturbation Experiments [112.25067497985447]
We introduce BioDiscoveryAgent, an agent that designs new experiments, reasons about their outcomes, and efficiently navigates the hypothesis space to reach desired solutions. BioDiscoveryAgent can uniquely design new experiments without the need to train a machine learning model. It achieves an average of 21% improvement in predicting relevant genetic perturbations across six datasets.
arXiv Detail & Related papers (2024-05-27T19:57:17Z)
Empowering Biomedical Discovery with AI Agents [15.125735219811268]
We envision "AI scientists" as systems capable of skeptical learning and reasoning. Biomedical AI agents combine human creativity and expertise with AI's ability to analyze large datasets. AI agents can impact areas ranging from virtual cell simulation, programmable control of phenotypes, and the design of cellular circuits to developing new therapies.
arXiv Detail & Related papers (2024-04-03T16:08:01Z)
EndToEndML: An Open-Source End-to-End Pipeline for Machine Learning Applications [0.2826977330147589]
We propose a web-based end-to-end pipeline that is capable of preprocessing, training, evaluating, and visualizing machine learning models. Our library assists in recognizing, classifying, clustering, and predicting a wide range of multi-modal, multi-sensor datasets.
arXiv Detail & Related papers (2024-03-27T02:24:38Z)
An Evaluation of Large Language Models in Bioinformatics Research [52.100233156012756]
We study the performance of large language models (LLMs) on a wide spectrum of crucial bioinformatics tasks. These tasks include the identification of potential coding regions, extraction of named entities for genes and proteins, detection of antimicrobial and anti-cancer peptides, molecular optimization, and resolution of educational bioinformatics problems. Our findings indicate that, given appropriate prompts, LLMs like GPT variants can successfully handle most of these tasks.
arXiv Detail & Related papers (2024-02-21T11:27:31Z)
Diversifying Knowledge Enhancement of Biomedical Language Models using Adapter Modules and Knowledge Graphs [54.223394825528665]
We develop an approach that uses lightweight adapter modules to inject structured biomedical knowledge into pre-trained language models. We use two large KGs, the biomedical knowledge system UMLS and the novel biochemical OntoChem, with two prominent biomedical PLMs, PubMedBERT and BioLinkBERT. We show that our methodology leads to performance improvements in several instances while keeping requirements in computing power low.
arXiv Detail & Related papers (2023-12-21T14:26:57Z)
LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day [85.19963303642427]
We propose a cost-efficient approach for training a vision-language conversational assistant that can answer open-ended research questions of biomedical images. The model first learns to align biomedical vocabulary using the figure-caption pairs as is, then learns to master open-ended conversational semantics. This enables us to train a Large Language and Vision Assistant for BioMedicine in less than 15 hours (with eight A100s)
arXiv Detail & Related papers (2023-06-01T16:50:07Z)
BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining [140.61707108174247]
We propose BioGPT, a domain-specific generative Transformer language model pre-trained on large scale biomedical literature. We get 44.98%, 38.42% and 40.76% F1 score on BC5CDR, KD-DTI and DDI end-to-end relation extraction tasks respectively, and 78.2% accuracy on PubMedQA.
arXiv Detail & Related papers (2022-10-19T07:17:39Z)
When Bioprocess Engineering Meets Machine Learning: A Survey from the Perspective of Automated Bioprocess Development [3.687740185234604]
Machine learning (ML) has significantly contributed to the development of bioprocess engineering, but its application is still limited. This review provides a comprehensive overview of ML-based automation in bioprocess development.
arXiv Detail & Related papers (2022-09-02T14:30:49Z)
BIOS: An Algorithmically Generated Biomedical Knowledge Graph [4.030892610300306]
We introduce the Biomedical Informatics Ontology System (BIOS), the first large scale publicly available BioMedKG that is fully generated by machine learning algorithms. BIOS contains 4.1 million concepts, 7.4 million terms in two languages, and 7.3 million relation triplets. Results suggest that machine learning-based BioMedKG development is a totally viable solution for replacing traditional expert curation.
arXiv Detail & Related papers (2022-03-18T14:09:22Z)
Deep metric learning improves lab of origin prediction of genetically engineered plasmids [63.05016513788047]
Genetic engineering attribution (GEA) is the ability to make sequence-lab associations. We propose a method, based on metric learning, that ranks the most likely labs-of-origin. We are able to extract key signatures in plasmid sequences for particular labs, allowing for an interpretable examination of the model's outputs.
arXiv Detail & Related papers (2021-11-24T16:29:03Z)
GenNI: Human-AI Collaboration for Data-Backed Text Generation [102.08127062293111]
Table2Text systems generate textual output based on structured data utilizing machine learning. GenNI (Generation Negotiation Interface) is an interactive visual system for high-level human-AI collaboration in producing descriptive text.
arXiv Detail & Related papers (2021-10-19T18:07:07Z)
Pre-trained Language Models in Biomedical Domain: A Systematic Survey [33.572502204216256]
Pre-trained language models (PLMs) have been the de facto paradigm for most natural language processing (NLP) tasks. This paper summarizes the recent progress of pre-trained language models in the biomedical domain and their applications in biomedical downstream tasks.
arXiv Detail & Related papers (2021-10-11T05:30:30Z)
EBIC.JL -- an Efficient Implementation of Evolutionary Biclustering Algorithm in Julia [59.422301529692454]
We introduce EBIC.JL - an implementation of one of the most accurate biclustering algorithms in Julia. We show that the new version maintains comparable accuracy to its predecessor EBIC while converging faster for the majority of the problems.
arXiv Detail & Related papers (2021-05-03T22:30:38Z)
GenoML: Automated Machine Learning for Genomics [3.2739205123864945]
GenoML is a Python package automating machine learning for genomics (genetics and multi-omics) GenoML's mission is to bring machine learning for genomics and clinical data to non-experts.
arXiv Detail & Related papers (2021-03-04T18:48:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.