ProBio: A Protocol-guided Multimodal Dataset for Molecular Biology Lab
- URL: http://arxiv.org/abs/2311.00556v1
- Date: Wed, 1 Nov 2023 14:44:01 GMT
- Title: ProBio: A Protocol-guided Multimodal Dataset for Molecular Biology Lab
- Authors: Jieming Cui, Ziren Gong, Baoxiong Jia, Siyuan Huang, Zilong Zheng,
Jianzhu Ma, Yixin Zhu
- Abstract summary: The challenge of replicating research results has posed a significant impediment to the field of molecular biology.
We first curate a comprehensive multimodal dataset, named ProBio, as an initial step towards this objective.
Next, we devise two challenging benchmarks, transparent solution tracking and multimodal action recognition, to emphasize the unique characteristics and difficulties associated with activity understanding in BioLab settings.
- Score: 67.24684071577211
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The challenge of replicating research results has posed a significant
impediment to the field of molecular biology. The advent of modern intelligent
systems has led to notable progress in various domains. Consequently, we
embarked on an investigation of intelligent monitoring systems as a means of
tackling the issue of the reproducibility crisis. Specifically, we first curate
a comprehensive multimodal dataset, named ProBio, as an initial step towards
this objective. This dataset comprises fine-grained hierarchical annotations
intended for the purpose of studying activity understanding in BioLab. Next, we
devise two challenging benchmarks, transparent solution tracking and multimodal
action recognition, to emphasize the unique characteristics and difficulties
associated with activity understanding in BioLab settings. Finally, we provide
a thorough experimental evaluation of contemporary video understanding models
and highlight their limitations in this specialized domain to identify
potential avenues for future research. We hope ProBio with associated
benchmarks may garner increased focus on modern AI techniques in the realm of
molecular biology.
Related papers
- Revolutionizing Biomarker Discovery: Leveraging Generative AI for Bio-Knowledge-Embedded Continuous Space Exploration [20.419747013569268]
We propose a new biomarker identification framework with two important modules: training data preparation and embedding-optimization-generation.
The first module uses a multi-agent system to automatically collect pairs of biomarker subsets and their corresponding prediction accuracy as training data.
The second module employs an encoder-evaluator-decoder learning paradigm to compress the knowledge of the collected data into a continuous space.
arXiv Detail & Related papers (2024-09-23T23:36:30Z) - BioDiscoveryAgent: An AI Agent for Designing Genetic Perturbation Experiments [112.25067497985447]
We introduce BioDiscoveryAgent, an agent that designs new experiments, reasons about their outcomes, and efficiently navigates the hypothesis space to reach desired solutions.
BioDiscoveryAgent can uniquely design new experiments without the need to train a machine learning model.
It achieves an average of 21% improvement in predicting relevant genetic perturbations across six datasets.
arXiv Detail & Related papers (2024-05-27T19:57:17Z) - An Evaluation of Large Language Models in Bioinformatics Research [52.100233156012756]
We study the performance of large language models (LLMs) on a wide spectrum of crucial bioinformatics tasks.
These tasks include the identification of potential coding regions, extraction of named entities for genes and proteins, detection of antimicrobial and anti-cancer peptides, molecular optimization, and resolution of educational bioinformatics problems.
Our findings indicate that, given appropriate prompts, LLMs like GPT variants can successfully handle most of these tasks.
arXiv Detail & Related papers (2024-02-21T11:27:31Z) - Progress and Opportunities of Foundation Models in Bioinformatics [77.74411726471439]
Foundations models (FMs) have ushered in a new era in computational biology, especially in the realm of deep learning.
Central to our focus is the application of FMs to specific biological problems, aiming to guide the research community in choosing appropriate FMs for their research needs.
Review analyses challenges and limitations faced by FMs in biology, such as data noise, model explainability, and potential biases.
arXiv Detail & Related papers (2024-02-06T02:29:17Z) - Causal machine learning for single-cell genomics [94.28105176231739]
We discuss the application of machine learning techniques to single-cell genomics and their challenges.
We first present the model that underlies most of current causal approaches to single-cell biology.
We then identify open problems in the application of causal approaches to single-cell data.
arXiv Detail & Related papers (2023-10-23T13:35:24Z) - InstructBio: A Large-scale Semi-supervised Learning Paradigm for
Biochemical Problems [38.57333125315448]
InstructMol is a semi-supervised learning algorithm to take better advantage of unlabeled examples.
InstructBio substantially improves the generalization ability of molecular models.
arXiv Detail & Related papers (2023-04-08T04:19:22Z) - Machine learning in bioprocess development: From promise to practice [58.720142291102135]
Data-driven methods like machine learning (ML) approaches have a high potential to rationally explore large design spaces.
The aim of this review is to demonstrate how ML methods have been applied so far in bioprocess development.
arXiv Detail & Related papers (2022-10-04T13:48:59Z) - When Bioprocess Engineering Meets Machine Learning: A Survey from the
Perspective of Automated Bioprocess Development [3.687740185234604]
Machine learning (ML) has significantly contributed to the development of bioprocess engineering, but its application is still limited.
This review provides a comprehensive overview of ML-based automation in bioprocess development.
arXiv Detail & Related papers (2022-09-02T14:30:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.