Toward a Team of AI-made Scientists for Scientific Discovery from Gene
Expression Data
- URL: http://arxiv.org/abs/2402.12391v2
- Date: Wed, 21 Feb 2024 03:42:32 GMT
- Title: Toward a Team of AI-made Scientists for Scientific Discovery from Gene
Expression Data
- Authors: Haoyang Liu, Yijiang Li, Jinglin Jian, Yuxuan Cheng, Jianrong Lu,
Shuyi Guo, Jinglei Zhu, Mianchen Zhang, Miantong Zhang, Haohan Wang
- Abstract summary: We introduce a novel framework, a Team of AI-made Scientists (TAIS), designed to streamline the scientific discovery pipeline.
TAIS comprises simulated roles, including a project manager, data engineer, and domain expert, each represented by a Large Language Model (LLM)
These roles collaborate to replicate the tasks typically performed by data scientists, with a specific focus on identifying disease-predictive genes.
- Score: 9.767546641019862
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine learning has emerged as a powerful tool for scientific discovery,
enabling researchers to extract meaningful insights from complex datasets. For
instance, it has facilitated the identification of disease-predictive genes
from gene expression data, significantly advancing healthcare. However, the
traditional process for analyzing such datasets demands substantial human
effort and expertise for the data selection, processing, and analysis. To
address this challenge, we introduce a novel framework, a Team of AI-made
Scientists (TAIS), designed to streamline the scientific discovery pipeline.
TAIS comprises simulated roles, including a project manager, data engineer, and
domain expert, each represented by a Large Language Model (LLM). These roles
collaborate to replicate the tasks typically performed by data scientists, with
a specific focus on identifying disease-predictive genes. Furthermore, we have
curated a benchmark dataset to assess TAIS's effectiveness in gene
identification, demonstrating our system's potential to significantly enhance
the efficiency and scope of scientific exploration. Our findings represent a
solid step towards automating scientific discovery through large language
models.
Related papers
- AIGS: Generating Science from AI-Powered Automated Falsification [17.50867181053229]
We propose Baby-AIGS as a baby-step demonstration of a full-process AIGS system, which is a multi-agent system with agents in roles representing key research process.
Experiments on three tasks preliminarily show that Baby-AIGS could produce meaningful scientific discoveries, though not on par with experienced human researchers.
arXiv Detail & Related papers (2024-11-17T13:40:35Z) - Knowledge-Driven Feature Selection and Engineering for Genotype Data with Large Language Models [35.084222907099644]
We develop FREEFORM, Free-flow Reasoning and Ensembling for Enhanced Feature Output and Robust Modeling.
FreeFORM is available as open-source framework at GitHub: https://github.com/PennShenLab/FREEFORM.
arXiv Detail & Related papers (2024-10-02T17:53:08Z) - GenoTEX: A Benchmark for Evaluating LLM-Based Exploration of Gene Expression Data in Alignment with Bioinformaticians [13.837406082703756]
We introduce GenoTEX, a benchmark dataset for the automatic exploration of gene expression data.
GenoTEX provides annotated code and results for solving a wide range of gene identification problems.
We present GenoAgents, a team of LLM-based agents designed with context-aware planning, iterative correction, and domain expert consultation.
arXiv Detail & Related papers (2024-06-21T17:55:24Z) - SciRIFF: A Resource to Enhance Language Model Instruction-Following over Scientific Literature [80.49349719239584]
We present SciRIFF (Scientific Resource for Instruction-Following and Finetuning), a dataset of 137K instruction-following demonstrations for 54 tasks.
SciRIFF is the first dataset focused on extracting and synthesizing information from research literature across a wide range of scientific fields.
arXiv Detail & Related papers (2024-06-10T21:22:08Z) - BioDiscoveryAgent: An AI Agent for Designing Genetic Perturbation Experiments [112.25067497985447]
We introduce BioDiscoveryAgent, an agent that designs new experiments, reasons about their outcomes, and efficiently navigates the hypothesis space to reach desired solutions.
BioDiscoveryAgent can uniquely design new experiments without the need to train a machine learning model.
It achieves an average of 21% improvement in predicting relevant genetic perturbations across six datasets.
arXiv Detail & Related papers (2024-05-27T19:57:17Z) - CRISPR-GPT: An LLM Agent for Automated Design of Gene-Editing Experiments [51.41735920759667]
Large Language Models (LLMs) have shown promise in various tasks, but they often lack specific knowledge and struggle to accurately solve biological design problems.
In this work, we introduce CRISPR-GPT, an LLM agent augmented with domain knowledge and external tools to automate and enhance the design process of CRISPR-based gene-editing experiments.
arXiv Detail & Related papers (2024-04-27T22:59:17Z) - GENEVIC: GENetic data Exploration and Visualization via Intelligent interactive Console [6.786793669890866]
GENEVIC is an AI-driven chat framework that bridges the gap between genetic data generation and biomedical knowledge discovery.
It automates the analysis, retrieval, and visualization of customized domain-specific genetic information.
It integrates functionalities to generate protein interaction networks, enrich gene sets, and search scientific literature from PubMed, Google Scholar, and arXiv.
arXiv Detail & Related papers (2024-04-04T20:53:30Z) - Causal machine learning for single-cell genomics [94.28105176231739]
We discuss the application of machine learning techniques to single-cell genomics and their challenges.
We first present the model that underlies most of current causal approaches to single-cell biology.
We then identify open problems in the application of causal approaches to single-cell data.
arXiv Detail & Related papers (2023-10-23T13:35:24Z) - Genetic InfoMax: Exploring Mutual Information Maximization in
High-Dimensional Imaging Genetics Studies [50.11449968854487]
Genome-wide association studies (GWAS) are used to identify relationships between genetic variations and specific traits.
Representation learning for imaging genetics is largely under-explored due to the unique challenges posed by GWAS.
We introduce a trans-modal learning framework Genetic InfoMax (GIM) to address the specific challenges of GWAS.
arXiv Detail & Related papers (2023-09-26T03:59:21Z) - A New Deep Learning and XAI-Based Algorithm for Features Selection in
Genomics [5.787117733071415]
The paper proposes a novel algorithm to perform Feature Selection on genomic-scale data.
Results of the application on a Chronic Lymphocytic Leukemia dataset evidence the effectiveness of the algorithm.
arXiv Detail & Related papers (2023-03-29T16:44:13Z) - Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype
Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients.
We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks.
Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.