Related papers: Understanding Biology in the Age of Artificial Intelligence

Understanding Biology in the Age of Artificial Intelligence

URL: http://arxiv.org/abs/2403.04106v1
Date: Wed, 6 Mar 2024 23:20:34 GMT
Title: Understanding Biology in the Age of Artificial Intelligence
Authors: Elsa Lawrence, Adham El-Shazly, Srijit Seal, Chaitanya K Joshi, Pietro Li\`o, Shantanu Singh, Andreas Bender, Pietro Sormanni, Matthew Greenig
Abstract summary: Modern life sciences research is increasingly relying on artificial intelligence approaches to model biological systems. Although machine learning (ML) models are useful for identifying patterns in large, complex data sets, its widespread application in biological sciences represents a significant deviation from traditional methods of scientific inquiry. Here, we identify general principles that can guide the design and application of ML systems to model biological phenomena and advance scientific knowledge.
Score: 4.299566787216408
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Modern life sciences research is increasingly relying on artificial intelligence approaches to model biological systems, primarily centered around the use of machine learning (ML) models. Although ML is undeniably useful for identifying patterns in large, complex data sets, its widespread application in biological sciences represents a significant deviation from traditional methods of scientific inquiry. As such, the interplay between these models and scientific understanding in biology is a topic with important implications for the future of scientific research, yet it is a subject that has received little attention. Here, we draw from an epistemological toolkit to contextualize recent applications of ML in biological sciences under modern philosophical theories of understanding, identifying general principles that can guide the design and application of ML systems to model biological phenomena and advance scientific knowledge. We propose that conceptions of scientific understanding as information compression, qualitative intelligibility, and dependency relation modelling provide a useful framework for interpreting ML-mediated understanding of biological systems. Through a detailed analysis of two key application areas of ML in modern biological research - protein structure prediction and single cell RNA-sequencing - we explore how these features have thus far enabled ML systems to advance scientific understanding of their target phenomena, how they may guide the development of future ML models, and the key obstacles that remain in preventing ML from achieving its potential as a tool for biological discovery. Consideration of the epistemological features of ML applications in biology will improve the prospects of these methods to solve important problems and advance scientific understanding of living systems.

Related papers

BioMaze: Benchmarking and Enhancing Large Language Models for Biological Pathway Reasoning [49.487327661584686]
We introduce BioMaze, a dataset with 5.1K complex pathway problems from real research. Our evaluation of methods such as CoT and graph-augmented reasoning, shows that LLMs struggle with pathway reasoning. To address this, we propose PathSeeker, an LLM agent that enhances reasoning through interactive subgraph-based navigation.
arXiv Detail & Related papers (2025-02-23T17:38:10Z)
Biology Instructions: A Dataset and Benchmark for Multi-Omics Sequence Understanding Capability of Large Language Models [51.316001071698224]
We introduce Biology-Instructions, the first large-scale multi-omics biological sequences-related instruction-tuning dataset. This dataset can bridge the gap between large language models (LLMs) and complex biological sequences-related tasks. We also develop a strong baseline called ChatMultiOmics with a novel three-stage training pipeline.
arXiv Detail & Related papers (2024-12-26T12:12:23Z)
Scientific Machine Learning Seismology [0.0]
Scientific machine learning (SciML) is an interdisciplinary research field that integrates machine learning, particularly deep learning, with physics theory to understand and predict complex natural phenomena. PINNs and neural operators (NOs) are two popular methods for SciML. The use of PINNs is expanding into areas such as simultaneous solutions of differential equations, inference in underdetermined systems, and regularization based on physics.
arXiv Detail & Related papers (2024-09-27T02:27:42Z)
LLM and Simulation as Bilevel Optimizers: A New Paradigm to Advance Physical Scientific Discovery [141.39722070734737]
We propose to enhance the knowledge-driven, abstract reasoning abilities of Large Language Models with the computational strength of simulations. We introduce Scientific Generative Agent (SGA), a bilevel optimization framework. We conduct experiments to demonstrate our framework's efficacy in law discovery and molecular design.
arXiv Detail & Related papers (2024-05-16T03:04:10Z)
Opportunities for machine learning in scientific discovery [16.526872562935463]
We review how the scientific community can increasingly leverage machine-learning techniques to achieve scientific discoveries. Although challenges remain, principled use of ML is opening up new avenues for fundamental scientific discoveries.
arXiv Detail & Related papers (2024-05-07T09:58:02Z)
Knowledge-guided Machine Learning: Current Trends and Future Prospects [14.783972088722193]
It also provides an introduction to the current state of research in the emerging field of scientific knowledge-guided machine learning (KGML) We discuss different facets of KGML research in terms of the type of scientific knowledge used, the form of knowledge-ML integration explored, and the method for incorporating scientific knowledge in ML.
arXiv Detail & Related papers (2024-03-24T02:54:46Z)
Progress and Opportunities of Foundation Models in Bioinformatics [77.74411726471439]
Foundations models (FMs) have ushered in a new era in computational biology, especially in the realm of deep learning. Central to our focus is the application of FMs to specific biological problems, aiming to guide the research community in choosing appropriate FMs for their research needs. Review analyses challenges and limitations faced by FMs in biology, such as data noise, model explainability, and potential biases.
arXiv Detail & Related papers (2024-02-06T02:29:17Z)
Diverse Explanations From Data-Driven and Domain-Driven Perspectives in the Physical Sciences [4.442043151145212]
This Perspective explores the sources and implications of diverse explanations in machine learning applications for physical sciences. We examine how different models, explanation methods, levels of feature attribution, and stakeholder needs can result in varying interpretations of ML outputs. Our analysis underscores the importance of considering multiple perspectives when interpreting ML models in scientific contexts.
arXiv Detail & Related papers (2024-02-01T05:28:28Z)
Scientific Large Language Models: A Survey on Biological & Chemical Domains [47.97810890521825]
Large Language Models (LLMs) have emerged as a transformative power in enhancing natural language comprehension. The application of LLMs extends beyond conventional linguistic boundaries, encompassing specialized linguistic systems developed within various scientific disciplines. As a burgeoning area in the community of AI for Science, scientific LLMs warrant comprehensive exploration.
arXiv Detail & Related papers (2024-01-26T05:33:34Z)
ProBio: A Protocol-guided Multimodal Dataset for Molecular Biology Lab [67.24684071577211]
The challenge of replicating research results has posed a significant impediment to the field of molecular biology. We first curate a comprehensive multimodal dataset, named ProBio, as an initial step towards this objective. Next, we devise two challenging benchmarks, transparent solution tracking and multimodal action recognition, to emphasize the unique characteristics and difficulties associated with activity understanding in BioLab settings.
arXiv Detail & Related papers (2023-11-01T14:44:01Z)
Causal machine learning for single-cell genomics [94.28105176231739]
We discuss the application of machine learning techniques to single-cell genomics and their challenges. We first present the model that underlies most of current causal approaches to single-cell biology. We then identify open problems in the application of causal approaches to single-cell data.
arXiv Detail & Related papers (2023-10-23T13:35:24Z)
Machine Learning in Nano-Scale Biomedical Engineering [77.75587007080894]
We review the existing research regarding the use of machine learning in nano-scale biomedical engineering. The main challenges that can be formulated as ML problems are classified into the three main categories. For each of the presented methodologies, special emphasis is given to its principles, applications, and limitations.
arXiv Detail & Related papers (2020-08-05T15:45:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.