Related papers: Comparative Performance Evaluation of Large Language Models for Extracting Molecular Interactions and Pathway Knowledge

Comparative Performance Evaluation of Large Language Models for Extracting Molecular Interactions and Pathway Knowledge

URL: http://arxiv.org/abs/2307.08813v2
Date: Wed, 18 Oct 2023 13:52:33 GMT
Title: Comparative Performance Evaluation of Large Language Models for Extracting Molecular Interactions and Pathway Knowledge
Authors: Gilchan Park, Byung-Jun Yoon, Xihaier Luo, Vanessa L\'opez-Marrero, Shinjae Yoo, Shantenu Jha
Abstract summary: understanding protein interactions and pathway knowledge is crucial for unraveling the complexities of living systems. Existing databases provide curated biological data from literature and other sources, but their maintenance is labor-intensive. We propose to harness the capabilities of large language models to address these issues by automatically extracting such knowledge from the relevant scientific literature.
Score: 6.244840529371179
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Understanding protein interactions and pathway knowledge is crucial for unraveling the complexities of living systems and investigating the underlying mechanisms of biological functions and complex diseases. While existing databases provide curated biological data from literature and other sources, they are often incomplete and their maintenance is labor-intensive, necessitating alternative approaches. In this study, we propose to harness the capabilities of large language models to address these issues by automatically extracting such knowledge from the relevant scientific literature. Toward this goal, in this work, we investigate the effectiveness of different large language models in tasks that involve recognizing protein interactions, identifying genes associated with pathways affected by low-dose radiation, and gene regulatory relations. We thoroughly evaluate the performance of various models, highlight the significant findings, and discuss both the future opportunities and the remaining challenges associated with this approach. The code and data are available at: https://github.com/boxorange/BioIE-LLM

Related papers

Platform for Representation and Integration of multimodal Molecular Embeddings [43.54912893426355]
Existing machine learning methods for molecular embeddings are restricted to specific tasks or data modalities.<n>Existing embeddings capture largely non-overlapping molecular signals, highlighting the value of embedding integration.<n>We propose Platform for Representation and Integration of multimodal Molecular Embeddings (PRISME) to integrate heterogeneous embeddings into a unified multimodal representation.
arXiv Detail & Related papers (2025-07-10T01:18:50Z)
In-silico biological discovery with large perturbation models [46.388631244976885]
We present the Large Perturbation Model (LPM), a deep-learning model that integrates perturbation experiments by representing perturbation, readout, and context as disentangled dimensions. LPM outperforms existing methods across multiple biological discovery tasks, including in predicting post-perturbation transcriptomes of unseen experiments.
arXiv Detail & Related papers (2025-03-30T17:41:25Z)
GENERator: A Long-Context Generative Genomic Foundation Model [66.46537421135996]
We present GENERator, a generative genomic foundation model featuring a context length of 98k base pairs (bp) and 1.2B parameters. Trained on an expansive dataset comprising 386B bp of DNA, the GENERator demonstrates state-of-the-art performance across both established and newly proposed benchmarks. It also shows significant promise in sequence optimization, particularly through the prompt-responsive generation of enhancer sequences with specific activity profiles.
arXiv Detail & Related papers (2025-02-11T05:39:49Z)
Biology Instructions: A Dataset and Benchmark for Multi-Omics Sequence Understanding Capability of Large Language Models [51.316001071698224]
We introduce Biology-Instructions, the first large-scale multi-omics biological sequences-related instruction-tuning dataset. This dataset can bridge the gap between large language models (LLMs) and complex biological sequences-related tasks. We also develop a strong baseline called ChatMultiOmics with a novel three-stage training pipeline.
arXiv Detail & Related papers (2024-12-26T12:12:23Z)
COMET: Benchmark for Comprehensive Biological Multi-omics Evaluation Tasks and Language Models [56.81513758682858]
COMET aims to evaluate models across single-omics, cross-omics, and multi-omics tasks. First, we curate and develop a diverse collection of downstream tasks and datasets covering key structural and functional aspects in DNA, RNA, and proteins. Then, we evaluate existing foundational language models for DNA, RNA, and proteins, as well as the newly proposed multi-omics method.
arXiv Detail & Related papers (2024-12-13T18:42:00Z)
Explainable AI Methods for Multi-Omics Analysis: A Survey [3.885941688264509]
Multi-omics refers to the integrative analysis of data derived from multiple 'omes' Deep learning methods are increasingly utilized to integrate multi-omics data, offering insights into molecular interactions and enhancing research into complex diseases. These models, with their numerous interconnected layers and nonlinear relationships, often function as black boxes, lacking transparency in decision-making processes. This review explores how xAI can improve the interpretability of deep learning models in multi-omics research, highlighting its potential to provide clinicians with clear insights.
arXiv Detail & Related papers (2024-10-15T05:01:17Z)
Interpreting artificial neural networks to detect genome-wide association signals for complex traits [0.0]
We trained artificial neural networks to predict complex traits using both simulated and real genotype-phenotype datasets. We detected multiple loci associated with schizophrenia.
arXiv Detail & Related papers (2024-07-26T15:20:42Z)
Explainable Biomedical Hypothesis Generation via Retrieval Augmented Generation enabled Large Language Models [46.05020842978823]
Large Language Models (LLMs) have emerged as powerful tools to navigate this complex data landscape. RAGGED is a comprehensive workflow designed to support investigators with knowledge integration and hypothesis generation.
arXiv Detail & Related papers (2024-07-17T07:44:18Z)
Optimal Transport for Latent Integration with An Application to Heterogeneous Neuronal Activity Data [1.5311478638611091]
We propose a novel heterogeneous data integration framework based on optimal transport to extract shared patterns in complex biological processes. Our approach is effective even with a small number of subjects, and does not require auxiliary matching information for the alignment.
arXiv Detail & Related papers (2024-06-27T04:29:21Z)
Cognitive Evolutionary Learning to Select Feature Interactions for Recommender Systems [59.117526206317116]
We show that CELL can adaptively evolve into different models for different tasks and data. Experiments on four real-world datasets demonstrate that CELL significantly outperforms state-of-the-art baselines.
arXiv Detail & Related papers (2024-05-29T02:35:23Z)
BioDiscoveryAgent: An AI Agent for Designing Genetic Perturbation Experiments [112.25067497985447]
We introduce BioDiscoveryAgent, an agent that designs new experiments, reasons about their outcomes, and efficiently navigates the hypothesis space to reach desired solutions. BioDiscoveryAgent can uniquely design new experiments without the need to train a machine learning model. It achieves an average of 21% improvement in predicting relevant genetic perturbations across six datasets.
arXiv Detail & Related papers (2024-05-27T19:57:17Z)
Leveraging Biomolecule and Natural Language through Multi-Modal Learning: A Survey [75.47055414002571]
The integration of biomolecular modeling with natural language (BL) has emerged as a promising interdisciplinary area at the intersection of artificial intelligence, chemistry and biology. We provide an analysis of recent advancements achieved through cross modeling of biomolecules and natural language.
arXiv Detail & Related papers (2024-03-03T14:59:47Z)
An Evaluation of Large Language Models in Bioinformatics Research [52.100233156012756]
We study the performance of large language models (LLMs) on a wide spectrum of crucial bioinformatics tasks. These tasks include the identification of potential coding regions, extraction of named entities for genes and proteins, detection of antimicrobial and anti-cancer peptides, molecular optimization, and resolution of educational bioinformatics problems. Our findings indicate that, given appropriate prompts, LLMs like GPT variants can successfully handle most of these tasks.
arXiv Detail & Related papers (2024-02-21T11:27:31Z)
Graph-Based Retriever Captures the Long Tail of Biomedical Knowledge [2.2814097119704058]
Large language models (LLMs) are transforming the way information is retrieved with vast amounts of knowledge being summarized and presented. LLMs are prone to highlight the most frequently seen pieces of information from the training set and to neglect the rare ones. We introduce a novel information-retrieval method that leverages a knowledge graph to downsample these clusters and mitigate the information overload problem.
arXiv Detail & Related papers (2024-02-19T18:31:11Z)
Causal machine learning for single-cell genomics [94.28105176231739]
We discuss the application of machine learning techniques to single-cell genomics and their challenges. We first present the model that underlies most of current causal approaches to single-cell biology. We then identify open problems in the application of causal approaches to single-cell data.
arXiv Detail & Related papers (2023-10-23T13:35:24Z)
Interpretable multimodal fusion networks reveal mechanisms of brain cognition [26.954460880062506]
We develop an interpretable multimodal fusion model, gCAM-CCL, which can perform automated diagnosis and result interpretation simultaneously. We validate the gCAM-CCL model on a brain imaging-genetic study, and show gCAM-CCL's performed well for both classification and mechanism analysis.
arXiv Detail & Related papers (2020-06-16T18:52:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.