Related papers: ChatCell: Facilitating Single-Cell Analysis with Natural Language

ChatCell: Facilitating Single-Cell Analysis with Natural Language

URL: http://arxiv.org/abs/2402.08303v4
Date: Tue, 20 Feb 2024 02:26:39 GMT
Title: ChatCell: Facilitating Single-Cell Analysis with Natural Language
Authors: Yin Fang, Kangwei Liu, Ningyu Zhang, Xinle Deng, Penghui Yang, Zhuo Chen, Xiangru Tang, Mark Gerstein, Xiaohui Fan, Huajun Chen
Abstract summary: ChatCell is a tool for facilitating single-cell analysis with natural language. ChatCell has acquired profound expertise in single-cell biology. Our project homepage is available at https://zjunlp.io/project/ChatCell.
Score: 40.4429032376233
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As Large Language Models (LLMs) rapidly evolve, their influence in science is becoming increasingly prominent. The emerging capabilities of LLMs in task generalization and free-form dialogue can significantly advance fields like chemistry and biology. However, the field of single-cell biology, which forms the foundational building blocks of living organisms, still faces several challenges. High knowledge barriers and limited scalability in current methods restrict the full exploitation of LLMs in mastering single-cell data, impeding direct accessibility and rapid iteration. To this end, we introduce ChatCell, which signifies a paradigm shift by facilitating single-cell analysis with natural language. Leveraging vocabulary adaptation and unified sequence generation, ChatCell has acquired profound expertise in single-cell biology and the capability to accommodate a diverse range of analysis tasks. Extensive experiments further demonstrate ChatCell's robust performance and potential to deepen single-cell insights, paving the way for more accessible and intuitive exploration in this pivotal field. Our project homepage is available at https://zjunlp.github.io/project/ChatCell.

Related papers

CellVerse: Do Large Language Models Really Understand Cell Biology? [74.34984441715517]
We introduce CellVerse, a unified language-centric question-answering benchmark that integrates four types of single-cell multi-omics data.<n>We systematically evaluate the performance across 14 open-source and closed-source LLMs ranging from 160M to 671B on CellVerse.
arXiv Detail & Related papers (2025-05-09T06:47:23Z)
A Multi-Modal AI Copilot for Single-Cell Analysis with Instruction Following [32.67347401145835]
Large language models excel at interpreting complex natural language instructions, enabling them to perform a wide range of tasks. We present InstructCell, a multi-modal AI copilot that leverages natural language as a medium for more direct and flexible single-cell analysis. InstructCell empowers researchers to accomplish critical tasks-such as cell type annotation, conditional pseudo-cell generation, and drug sensitivity prediction-using straightforward natural language commands.
arXiv Detail & Related papers (2025-01-14T15:12:19Z)
Biology Instructions: A Dataset and Benchmark for Multi-Omics Sequence Understanding Capability of Large Language Models [51.316001071698224]
We introduce Biology-Instructions, the first large-scale multi-omics biological sequences-related instruction-tuning dataset. This dataset can bridge the gap between large language models (LLMs) and complex biological sequences-related tasks. We also develop a strong baseline called ChatMultiOmics with a novel three-stage training pipeline.
arXiv Detail & Related papers (2024-12-26T12:12:23Z)
scReader: Prompting Large Language Models to Interpret scRNA-seq Data [12.767105992391555]
We propose an innovative hybrid approach that integrates the general knowledge capabilities of large language models with domain-specific representation models for single-cell omics data interpretation. By inputting single-cell gene-level expression data with prompts, we effectively model cellular representations based on the differential expression levels of genes across various species and cell types.
arXiv Detail & Related papers (2024-12-24T04:28:42Z)
Single-Cell Omics Arena: A Benchmark Study for Large Language Models on Cell Type Annotation Using Single-Cell Data [13.56585855722118]
Large language models (LLMs) have demonstrated their ability to efficiently process and synthesize vast corpora of text to automatically extract biological knowledge. Our study explores the potential of LLMs to accurately classify and annotate cell types in single-cell RNA sequencing (scRNA-seq) data. The results demonstrate that LLMs can provide robust interpretations of single-cell data without requiring additional fine-tuning.
arXiv Detail & Related papers (2024-12-03T23:58:35Z)
How to Build the Virtual Cell with Artificial Intelligence: Priorities and Opportunities [46.671834972945874]
We propose a vision of leveraging advances in AI to construct virtual cells. We discuss desired capabilities of such AI Virtual Cells, including generating universal representations of biological entities. We envision a future where AI Virtual Cells help identify new drug targets, predict cellular responses to perturbations, as well as scale hypothesis exploration.
arXiv Detail & Related papers (2024-09-18T02:41:50Z)
LangCell: Language-Cell Pre-training for Cell Identity Understanding [3.6518971609937068]
We introduce LangCell, a unified representation of single-cell data and natural language during the pre-training phase. Results show that LangCell is the only single-cell PLM that can work effectively in zero-shot cell identity understanding scenarios.
arXiv Detail & Related papers (2024-05-09T10:04:05Z)
An Evaluation of Large Language Models in Bioinformatics Research [52.100233156012756]
We study the performance of large language models (LLMs) on a wide spectrum of crucial bioinformatics tasks. These tasks include the identification of potential coding regions, extraction of named entities for genes and proteins, detection of antimicrobial and anti-cancer peptides, molecular optimization, and resolution of educational bioinformatics problems. Our findings indicate that, given appropriate prompts, LLMs like GPT variants can successfully handle most of these tasks.
arXiv Detail & Related papers (2024-02-21T11:27:31Z)
Prediction of Cellular Identities from Trajectory and Cell Fate Information [0.40964539027092917]
We propose an innovative approach to cell identification during early $textitC. elegansgenesis using machine learning. We employ random forest, embryo, and LSTM models, and tested cell classification accuracy on 3D time-lapse datasets spanning the first 4 hours of embryogenesis. Our research demonstrates the success of predicting cell identities in time-lapse imaging sequences directly from simple spatial-temporal features.
arXiv Detail & Related papers (2024-01-11T03:28:13Z)
RigLSTM: Recurrent Independent Grid LSTM for Generalizable Sequence Learning [75.61681328968714]
We propose recurrent independent Grid LSTM (RigLSTM) to exploit the underlying modular structure of the target task. Our model adopts cell selection, input feature selection, hidden state selection, and soft state updating to achieve a better generalization ability.
arXiv Detail & Related papers (2023-11-03T07:40:06Z)
Causal machine learning for single-cell genomics [94.28105176231739]
We discuss the application of machine learning techniques to single-cell genomics and their challenges. We first present the model that underlies most of current causal approaches to single-cell biology. We then identify open problems in the application of causal approaches to single-cell data.
arXiv Detail & Related papers (2023-10-23T13:35:24Z)
Revolutionizing Single Cell Analysis: The Power of Large Language Models for Cell Type Annotation [0.0]
Large language models such as ChatGPT and New Bing provide accurate annotations of cell types. By using ChatGPT to annotate single cell data, we can relate rare cell type to their function. This can have important applications in understanding cancer progression, mammalian development, and stem cell differentiation.
arXiv Detail & Related papers (2023-04-05T18:45:54Z)
Deep Learning in Single-Cell Analysis [34.08722045363822]
Single-cell technologies are revolutionizing the entire field of biology. Deep learning often demonstrates superior performance compared to traditional machine learning methods. This survey will serve as a reference for biologists and computer scientists, encouraging collaborations.
arXiv Detail & Related papers (2022-10-22T08:26:41Z)
Towards an Automatic Analysis of CHO-K1 Suspension Growth in Microfluidic Single-cell Cultivation [63.94623495501023]
We propose a novel Machine Learning architecture, which allows us to infuse a neural deep network with human-powered abstraction on the level of data. Specifically, we train a generative model simultaneously on natural and synthetic data, so that it learns a shared representation, from which a target variable, such as the cell count, can be reliably estimated.
arXiv Detail & Related papers (2020-10-20T08:36:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.