Related papers: Open World Knowledge Aided Single-Cell Foundation Model with Robust Cross-Modal Cell-Language Pre-training

Open World Knowledge Aided Single-Cell Foundation Model with Robust Cross-Modal Cell-Language Pre-training

URL: http://arxiv.org/abs/2601.05648v1
Date: Fri, 09 Jan 2026 09:10:14 GMT
Title: Open World Knowledge Aided Single-Cell Foundation Model with Robust Cross-Modal Cell-Language Pre-training
Authors: Haoran Wang, Xuanyi Zhang, Shuangsang Fang, Longke Ran, Ziqing Deng, Yong Zhang, Yuxiang Li, Shaoshuai Li,
Abstract summary: We propose an Open-world Language Knowledge-Aided Robust Single-Cell Foundation Model (OKR-CELL)<n>It is built based on a cross-modal Cell-Language pre-training framework, which comprises two key innovations.<n>OKR-CELL obtains cutting-edge results across 6 evaluation tasks.
Score: 7.812507078660317
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advancements in single-cell multi-omics, particularly RNA-seq, have provided profound insights into cellular heterogeneity and gene regulation. While pre-trained language model (PLM) paradigm based single-cell foundation models have shown promise, they remain constrained by insufficient integration of in-depth individual profiles and neglecting the influence of noise within multi-modal data. To address both issues, we propose an Open-world Language Knowledge-Aided Robust Single-Cell Foundation Model (OKR-CELL). It is built based on a cross-modal Cell-Language pre-training framework, which comprises two key innovations: (1) leveraging Large Language Models (LLMs) based workflow with retrieval-augmented generation (RAG) enriches cell textual descriptions using open-world knowledge; (2) devising a Cross-modal Robust Alignment (CRA) objective that incorporates sample reliability assessment, curriculum learning, and coupled momentum contrastive learning to strengthen the model's resistance to noisy data. After pretraining on 32M cell-text pairs, OKR-CELL obtains cutting-edge results across 6 evaluation tasks. Beyond standard benchmarks such as cell clustering, cell-type annotation, batch-effect correction, and few-shot annotation, the model also demonstrates superior performance in broader multi-modal applications, including zero-shot cell-type annotation and bidirectional cell-text retrieval.

Related papers

Cell2Text: Multimodal LLM for Generating Single-Cell Descriptions from RNA-Seq Data [17.440176654185095]
We introduce Cell2Text, a framework that translates scRNA-seq profiles into structured natural language descriptions.<n>By integrating gene-level embeddings with pretrained large language models, Cell2Text generates coherent summaries that capture cellular identity, tissue origin, disease associations, and pathway activity.
arXiv Detail & Related papers (2025-09-29T14:20:50Z)
BMFM-RNA: An Open Framework for Building and Evaluating Transcriptomic Foundation Models [1.1906923449182683]
We present BMFM-RNA, an open-source, modular software package that unifies diverse TFM pretraining and fine-tuning objectives.<n>We introduce a novel training objective, whole cell expression decoder (WCED), which captures global expression patterns using an autoencoder-like CLS bottleneck representation.<n>We show that WCED-based models achieve performance that matches or exceeds state-of-the-art approaches like scGPT.
arXiv Detail & Related papers (2025-06-17T15:40:08Z)
TAViS: Text-bridged Audio-Visual Segmentation with Foundation Models [123.17643568298116]
We present TAViS, a novel framework that textbfcouples the knowledge of multimodal foundation models for cross-modal alignment.<n> effectively combining these models poses two key challenges: the difficulty in transferring the knowledge between SAM2 and ImageBind due to their different feature spaces, and the insufficiency of using only segmentation loss for supervision.<n>Our approach achieves superior performance on single-source, multi-source, semantic datasets, and excels in zero-shot settings.
arXiv Detail & Related papers (2025-06-13T03:19:47Z)
Cell-o1: Training LLMs to Solve Single-Cell Reasoning Puzzles with Reinforcement Learning [44.91329557101423]
We introduce the CellPuzzles task, where the objective is to assign unique cell types to a batch of cells.<n>This benchmark spans diverse tissues, diseases, and donor conditions, and requires reasoning across the batch-level cellular context to ensure label uniqueness.<n>We propose Cell-o1, a 7B LLM trained via supervised fine-tuning on distilled reasoning traces, followed by reinforcement learning with batch-level rewards.
arXiv Detail & Related papers (2025-06-03T14:16:53Z)
CellVerse: Do Large Language Models Really Understand Cell Biology? [74.34984441715517]
We introduce CellVerse, a unified language-centric question-answering benchmark that integrates four types of single-cell multi-omics data.<n>We systematically evaluate the performance across 14 open-source and closed-source LLMs ranging from 160M to 671B on CellVerse.
arXiv Detail & Related papers (2025-05-09T06:47:23Z)
Language-Enhanced Representation Learning for Single-Cell Transcriptomics [27.33236345953242]
We propose scMMGPT, a novel framework designed for language-enhanced representation learning in single-cell transcriptomics.<n> scMMGPT employs robust cell representation extraction, preserving quantitative gene expression data, and introduces an innovative two-stage pre-training strategy.
arXiv Detail & Related papers (2025-03-12T14:26:16Z)
scReader: Prompting Large Language Models to Interpret scRNA-seq Data [12.767105992391555]
We propose an innovative hybrid approach that integrates the general knowledge capabilities of large language models with domain-specific representation models for single-cell omics data interpretation.<n>By inputting single-cell gene-level expression data with prompts, we effectively model cellular representations based on the differential expression levels of genes across various species and cell types.
arXiv Detail & Related papers (2024-12-24T04:28:42Z)
Multi-Modal and Multi-Attribute Generation of Single Cells with CFGen [76.02070962797794]
This work introduces CellFlow for Generation (CFGen), a flow-based conditional generative model that preserves the inherent discreteness of single-cell data.<n>CFGen generates whole-genome multi-modal single-cell data reliably, improving the recovery of crucial biological data characteristics.
arXiv Detail & Related papers (2024-07-16T14:05:03Z)
Mixed Models with Multiple Instance Learning [51.440557223100164]
We introduce MixMIL, a framework integrating Generalized Linear Mixed Models (GLMM) and Multiple Instance Learning (MIL) Our empirical results reveal that MixMIL outperforms existing MIL models in single-cell datasets.
arXiv Detail & Related papers (2023-11-04T16:42:42Z)
Benchmarking Diverse-Modal Entity Linking with Generative Models [78.93737257356784]
We construct a benchmark for diverse-modal EL (DMEL) from existing EL datasets. To approach the DMEL task, we proposed a generative diverse-modal model (GDMM) following a multimodal-encoder-decoder paradigm. GDMM builds a stronger DMEL baseline, outperforming state-of-the-art task-specific EL models by 8.51 F1 score on average.
arXiv Detail & Related papers (2023-05-27T02:38:46Z)
Learning Contextual Representations for Semantic Parsing with Generation-Augmented Pre-Training [86.91380874390778]
We present Generation-Augmented Pre-training (GAP), that jointly learns representations of natural language utterances and table schemas by leveraging generation models to generate pre-train data. Based on experimental results, neural semantics that leverage GAP MODEL obtain new state-of-the-art results on both SPIDER and CRITERIA-TO-generative benchmarks.
arXiv Detail & Related papers (2020-12-18T15:53:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.