Related papers: Recent advances in deep learning and language models for studying the microbiome

Recent advances in deep learning and language models for studying the microbiome

URL: http://arxiv.org/abs/2409.10579v1
Date: Sun, 15 Sep 2024 18:32:31 GMT
Title: Recent advances in deep learning and language models for studying the microbiome
Authors: Binghao Yan, Yunbi Nam, Lingyao Li, Rebecca A. Deek, Hongzhe Li, Siyuan Ma,
Abstract summary: We review applications of deep learning and language models in analyzing microbiome and metagenomics data. We provide an extensive overview of protein/genomic language modeling and their contributions to microbiome studies. We also discuss applications such as novel viromics language modeling, biosynthetic gene cluster prediction, and knowledge integration for metagenomics studies.
Score: 3.2676374150532173
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advancements in deep learning, particularly large language models (LLMs), made a significant impact on how researchers study microbiome and metagenomics data. Microbial protein and genomic sequences, like natural languages, form a language of life, enabling the adoption of LLMs to extract useful insights from complex microbial ecologies. In this paper, we review applications of deep learning and language models in analyzing microbiome and metagenomics data. We focus on problem formulations, necessary datasets, and the integration of language modeling techniques. We provide an extensive overview of protein/genomic language modeling and their contributions to microbiome studies. We also discuss applications such as novel viromics language modeling, biosynthetic gene cluster prediction, and knowledge integration for metagenomics studies.

Related papers

Large Language Models in Bioinformatics: A Survey [13.722344139230827]
Large Language Models (LLMs) are revolutionizing bioinformatics, enabling advanced analysis of DNA, RNA, proteins, and single-cell data. This survey provides a systematic review of recent advancements, focusing on genomic sequence modeling, RNA structure prediction, protein function inference, and single-cell transcriptomics.
arXiv Detail & Related papers (2025-03-06T14:38:20Z)
Biological Sequence with Language Model Prompting: A Survey [14.270959261105968]
Large Language models (LLMs) have emerged as powerful tools for addressing challenges across diverse domains. This paper systematically investigates the application of prompt-based methods with LLMs to biological sequences.
arXiv Detail & Related papers (2025-03-06T06:28:36Z)
Large Language Models for Bioinformatics [58.892165394487414]
This survey focuses on the evolution, classification, and distinguishing features of bioinformatics-specific language models (BioLMs) We explore the wide-ranging applications of BioLMs in critical areas such as disease diagnosis, drug discovery, and vaccine development. We identify key challenges and limitations inherent in BioLMs, including data privacy and security concerns, interpretability issues, biases in training data and model outputs, and domain adaptation complexities.
arXiv Detail & Related papers (2025-01-10T01:43:05Z)
Biology Instructions: A Dataset and Benchmark for Multi-Omics Sequence Understanding Capability of Large Language Models [51.316001071698224]
We introduce Biology-Instructions, the first large-scale multi-omics biological sequences-related instruction-tuning dataset. This dataset can bridge the gap between large language models (LLMs) and complex biological sequences-related tasks. We also develop a strong baseline called ChatMultiOmics with a novel three-stage training pipeline.
arXiv Detail & Related papers (2024-12-26T12:12:23Z)
COMET: Benchmark for Comprehensive Biological Multi-omics Evaluation Tasks and Language Models [56.81513758682858]
COMET aims to evaluate models across single-omics, cross-omics, and multi-omics tasks. First, we curate and develop a diverse collection of downstream tasks and datasets covering key structural and functional aspects in DNA, RNA, and proteins. Then, we evaluate existing foundational language models for DNA, RNA, and proteins, as well as the newly proposed multi-omics method.
arXiv Detail & Related papers (2024-12-13T18:42:00Z)
Artificial Intelligence for Microbiology and Microbiome Research [3.4014872469607695]
microbiology and microbiome research experiencing breakthroughs through machine learning and deep learning applications. This review provides a comprehensive overview of AI-driven approaches tailored for microbiology and microbiome studies.
arXiv Detail & Related papers (2024-11-02T01:03:43Z)
Genomic Language Models: Opportunities and Challenges [0.2912705470788796]
Genomic Language Models (gLMs) have the potential to significantly advance our understanding of genomes. We highlight key applications of gLMs, including functional constraint prediction, sequence design, and transfer learning. We discuss major considerations for developing and evaluating gLMs.
arXiv Detail & Related papers (2024-07-16T06:57:35Z)
AI for Biomedicine in the Era of Large Language Models [5.860347634642287]
Recent advancements in large language models have showcased prowess in natural language tasks. Can we harness the potential of these models to drive biomedical knowledge discoveries? We will explore the application of large language models to three crucial categories of biomedical data.
arXiv Detail & Related papers (2024-03-23T01:40:22Z)
Leveraging Biomolecule and Natural Language through Multi-Modal Learning: A Survey [75.47055414002571]
The integration of biomolecular modeling with natural language (BL) has emerged as a promising interdisciplinary area at the intersection of artificial intelligence, chemistry and biology. We provide an analysis of recent advancements achieved through cross modeling of biomolecules and natural language.
arXiv Detail & Related papers (2024-03-03T14:59:47Z)
An Evaluation of Large Language Models in Bioinformatics Research [52.100233156012756]
We study the performance of large language models (LLMs) on a wide spectrum of crucial bioinformatics tasks. These tasks include the identification of potential coding regions, extraction of named entities for genes and proteins, detection of antimicrobial and anti-cancer peptides, molecular optimization, and resolution of educational bioinformatics problems. Our findings indicate that, given appropriate prompts, LLMs like GPT variants can successfully handle most of these tasks.
arXiv Detail & Related papers (2024-02-21T11:27:31Z)
Large language models in bioinformatics: applications and perspectives [14.16418711188321]
Large language models (LLMs) are artificial intelligence models based on deep learning. This review focuses on exploring the applications of large language models in genomics, transcriptomics, drug discovery and single cell analysis.
arXiv Detail & Related papers (2024-01-08T17:26:59Z)
Diversifying Knowledge Enhancement of Biomedical Language Models using Adapter Modules and Knowledge Graphs [54.223394825528665]
We develop an approach that uses lightweight adapter modules to inject structured biomedical knowledge into pre-trained language models. We use two large KGs, the biomedical knowledge system UMLS and the novel biochemical OntoChem, with two prominent biomedical PLMs, PubMedBERT and BioLinkBERT. We show that our methodology leads to performance improvements in several instances while keeping requirements in computing power low.
arXiv Detail & Related papers (2023-12-21T14:26:57Z)
Causal machine learning for single-cell genomics [94.28105176231739]
We discuss the application of machine learning techniques to single-cell genomics and their challenges. We first present the model that underlies most of current causal approaches to single-cell biology. We then identify open problems in the application of causal approaches to single-cell data.
arXiv Detail & Related papers (2023-10-23T13:35:24Z)
Towards an Automatic Analysis of CHO-K1 Suspension Growth in Microfluidic Single-cell Cultivation [63.94623495501023]
We propose a novel Machine Learning architecture, which allows us to infuse a neural deep network with human-powered abstraction on the level of data. Specifically, we train a generative model simultaneously on natural and synthetic data, so that it learns a shared representation, from which a target variable, such as the cell count, can be reliably estimated.
arXiv Detail & Related papers (2020-10-20T08:36:51Z)
Machine Learning in Nano-Scale Biomedical Engineering [77.75587007080894]
We review the existing research regarding the use of machine learning in nano-scale biomedical engineering. The main challenges that can be formulated as ML problems are classified into the three main categories. For each of the presented methodologies, special emphasis is given to its principles, applications, and limitations.
arXiv Detail & Related papers (2020-08-05T15:45:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.