Related papers: scHyena: Foundation Model for Full-Length Single-Cell RNA-Seq Analysis in Brain

scHyena: Foundation Model for Full-Length Single-Cell RNA-Seq Analysis in Brain

URL: http://arxiv.org/abs/2310.02713v1
Date: Wed, 4 Oct 2023 10:30:08 GMT
Title: scHyena: Foundation Model for Full-Length Single-Cell RNA-Seq Analysis in Brain
Authors: Gyutaek Oh, Baekgyu Choi, Inkyung Jung, and Jong Chul Ye
Abstract summary: We introduce scHyena, a foundation model designed to address these challenges and enhance the accuracy of scRNA-seq analysis in the brain. scHyena is equipped with a linear adaptor layer, the positional encoding via gene-embedding, and a bidirectional Hyena operator. This enables us to process full-length scRNA-seq data without losing any information from the raw data.
Score: 46.39828178736219
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Single-cell RNA sequencing (scRNA-seq) has made significant strides in unraveling the intricate cellular diversity within complex tissues. This is particularly critical in the brain, presenting a greater diversity of cell types than other tissue types, to gain a deeper understanding of brain function within various cellular contexts. However, analyzing scRNA-seq data remains a challenge due to inherent measurement noise stemming from dropout events and the limited utilization of extensive gene expression information. In this work, we introduce scHyena, a foundation model designed to address these challenges and enhance the accuracy of scRNA-seq analysis in the brain. Specifically, inspired by the recent Hyena operator, we design a novel Transformer architecture called singe-cell Hyena (scHyena) that is equipped with a linear adaptor layer, the positional encoding via gene-embedding, and a {bidirectional} Hyena operator. This enables us to process full-length scRNA-seq data without losing any information from the raw data. In particular, our model learns generalizable features of cells and genes through pre-training scHyena using the full length of scRNA-seq data. We demonstrate the superior performance of scHyena compared to other benchmark methods in downstream tasks, including cell type classification and scRNA-seq imputation.

Related papers

RNAMunin: A Deep Machine Learning Model for Non-coding RNA Discovery [0.0]
Non-coding RNAs (ncRNAs) are critical for regulating bacterial and archaeal physiology, stress response and metabolism.<n>This paper presents RNAMunin, a machine learning (ML) model that is capable of finding ncRNAs using genomic sequence alone.<n> RNAMunin is trained on Rfam sequences extracted from approximately 60 Gbp of long read metagenomes from 16 San Francisco Estuary samples.
arXiv Detail & Related papers (2025-07-16T06:33:50Z)
CircFormerMoE: An End-to-End Deep Learning Framework for Circular RNA Splice Site Detection and Pairing in Plant Genomes [0.0]
Circular RNAs (circRNAs) are important components of the non-coding RNA regulatory network.<n>We propose a deep learning framework named CircFormerMoE based on transformers and mixture-of experts for predicting circRNAs directly from plant genomic DNA.
arXiv Detail & Related papers (2025-07-11T12:43:17Z)
scMamba: A Pre-Trained Model for Single-Nucleus RNA Sequencing Analysis in Neurodegenerative Disorders [43.24785083027205]
scMamba is a pre-trained model designed to improve the quality and utility of snRNA-seq analysis. Inspired by the recent Mamba model, scMamba introduces a novel architecture that incorporates a linear adapter layer, gene embeddings, and bidirectional Mamba blocks. We demonstrate that scMamba outperforms benchmark methods in various downstream tasks, including cell type annotation, doublet detection, imputation, and the identification of differentially expressed genes.
arXiv Detail & Related papers (2025-02-12T11:48:22Z)
GENERator: A Long-Context Generative Genomic Foundation Model [66.46537421135996]
We present GENERator, a generative genomic foundation model featuring a context length of 98k base pairs (bp) and 1.2B parameters. Trained on an expansive dataset comprising 386B bp of DNA, the GENERator demonstrates state-of-the-art performance across both established and newly proposed benchmarks. It also shows significant promise in sequence optimization, particularly through the prompt-responsive generation of enhancer sequences with specific activity profiles.
arXiv Detail & Related papers (2025-02-11T05:39:49Z)
RNACG: A Universal RNA Sequence Conditional Generation model based on Flow-Matching [0.0]
We develop a universal RNA sequence generation model based on flow matching, namely RNACG. RNACG can accommodate various conditional inputs and is portable, enabling users to customize the encoding network for conditional inputs. RNACG exhibits extensive applicability in sequence generation and property prediction tasks.
arXiv Detail & Related papers (2024-07-29T09:46:46Z)
BEACON: Benchmark for Comprehensive RNA Tasks and Language Models [60.02663015002029]
We introduce the first comprehensive RNA benchmark BEACON (textbfBEnchmtextbfArk for textbfCOmprehensive RtextbfNA Task and Language Models). First, BEACON comprises 13 distinct tasks derived from extensive previous work covering structural analysis, functional studies, and engineering applications. Second, we examine a range of models, including traditional approaches like CNNs, as well as advanced RNA foundation models based on language models, offering valuable insights into the task-specific performances of these models. Third, we investigate the vital RNA language model components
arXiv Detail & Related papers (2024-06-14T19:39:19Z)
scCDCG: Efficient Deep Structural Clustering for single-cell RNA-seq via Deep Cut-informed Graph Embedding [12.996418312603284]
scCDCG (single-cell RNA-seq Clustering via Deep Cut-informed Graph) is a novel framework designed for efficient and accurate clustering of scRNA-seq data. scCDCG comprises three main components: (i) A graph embedding module utilizing deep cut-informed techniques, which effectively captures intercellular high-order structural information. (ii) A self-supervised learning module guided by optimal transport, tailored to accommodate the unique complexities of scRNA-seq data.
arXiv Detail & Related papers (2024-04-09T09:46:17Z)
scBiGNN: Bilevel Graph Representation Learning for Cell Type Classification from Single-cell RNA Sequencing Data [62.87454293046843]
Graph neural networks (GNNs) have been widely used for automatic cell type classification. scBiGNN comprises two GNN modules to identify cell types. scBiGNN outperforms a variety of existing methods for cell type classification from scRNA-seq data.
arXiv Detail & Related papers (2023-12-16T03:54:26Z)
xTrimoGene: An Efficient and Scalable Representation Learner for Single-Cell RNA-Seq Data [45.043516102428676]
We propose a novel asymmetric encoder-decoder transformer for scRNA-seq data, called xTrimoGene$alpha$ (or xTrimoGene for short) xTrimoGene reduces FLOPs by one to two orders of magnitude compared to classical transformers while maintaining high accuracy. Our experiments also show that the performance of xTrimoGene improves as we scale up the model sizes.
arXiv Detail & Related papers (2023-11-26T01:23:01Z)
HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide Resolution [76.97231739317259]
We present HyenaDNA, a genomic foundation model pretrained on the human reference genome with context lengths of up to 1 million tokens at the single nucleotide-level. On fine-tuned benchmarks from the Nucleotide Transformer, HyenaDNA reaches state-of-the-art (SotA) on 12 of 18 datasets using a model with orders of magnitude less parameters and pretraining data.
arXiv Detail & Related papers (2023-06-27T20:46:34Z)
Optirank: classification for RNA-Seq data with optimal ranking reference genes [0.0]
We propose a logistic regression model, optirank, which learns simultaneously the parameters of the model and the genes to use as a reference set in the ranking. We also consider real classification tasks, which present different kinds of distribution shifts between train and test data.
arXiv Detail & Related papers (2023-01-11T10:49:06Z)
Accurate RNA 3D structure prediction using a language model-based deep learning approach [50.193512039121984]
RhoFold+ is an RNA language model-based deep learning method that accurately predicts 3D structures of single-chain RNAs from sequences. RhoFold+ offers a fully automated end-to-end pipeline for RNA 3D structure prediction.
arXiv Detail & Related papers (2022-07-04T17:15:35Z)
Approximate kNN Classification for Biomedical Data [1.1852406625172218]
Single-cell RNA-seq (scRNA-seq) is an emerging DNA sequencing technology with promising capabilities but significant computational challenges. We propose the utilization of approximate nearest neighbor search algorithms for the task of kNN classification in scRNA-seq data.
arXiv Detail & Related papers (2020-12-03T18:30:43Z)
Cell Type Identification from Single-Cell Transcriptomic Data via Semi-supervised Learning [2.4271601178529063]
Cell type identification from single-cell transcriptomic data is a common goal of single-cell RNA sequencing (scRNAseq) data analysis. We propose a semi-supervised learning model to use unlabeled scRNAseq cells and limited amount of labeled scRNAseq cells to implement cell identification. It is observed that the proposed model is able to achieve encouraging performance by learning on very limited amount of labeled scRNAseq cells.
arXiv Detail & Related papers (2020-05-06T19:15:43Z)
A Systematic Approach to Featurization for Cancer Drug Sensitivity Predictions with Deep Learning [49.86828302591469]
We train >35,000 neural network models, sweeping over common featurization techniques. We found the RNA-seq to be highly redundant and informative even with subsets larger than 128 features.
arXiv Detail & Related papers (2020-04-30T20:42:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.