Integrated Transcriptomic-proteomic Biomarker Identification for Radiation Response Prediction in Non-small Cell Lung Cancer Cell Lines
- URL: http://arxiv.org/abs/2511.22735v1
- Date: Thu, 27 Nov 2025 20:01:51 GMT
- Title: Integrated Transcriptomic-proteomic Biomarker Identification for Radiation Response Prediction in Non-small Cell Lung Cancer Cell Lines
- Authors: Yajun Yu, Guoping Xu, Steve Jiang, Robert Timmerman, John Minna, Yuanyuan Zhang, Hao Peng,
- Abstract summary: First proteotranscriptomic framework for SF2 prediction in non-small cell lung cancer (NSCLC)<n> integrated transcriptome-proteome framework for identifying concurrent biomarkers predictive of radiation response.<n>Independent pipelines identified 20 prioritized gene signatures from transcriptomic, proteomic, and combined datasets.
- Score: 7.496897814762568
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To develop an integrated transcriptome-proteome framework for identifying concurrent biomarkers predictive of radiation response, as measured by survival fraction at 2 Gy (SF2), in non-small cell lung cancer (NSCLC) cell lines. RNA sequencing (RNA-seq) and data-independent acquisition mass spectrometry (DIA-MS) proteomic data were collected from 73 and 46 NSCLC cell lines, respectively. Following preprocessing, 1,605 shared genes were retained for analysis. Feature selection was performed using least absolute shrinkage and selection operator (Lasso) regression with a frequency-based ranking criterion under five-fold cross-validation repeated ten times. Support vector regression (SVR) models were constructed using transcriptome-only, proteome-only, and combined transcriptome-proteome feature sets. Model performance was assessed by the coefficient of determination (R2) and root mean square error (RMSE). Correlation analyses evaluated concordance between RNA and protein expression and the relationships of selected biomarkers with SF2. RNA-protein expression exhibited significant positive correlations (median Pearson's r = 0.363). Independent pipelines identified 20 prioritized gene signatures from transcriptomic, proteomic, and combined datasets. Models trained on single-omic features achieved limited cross-omic generalizability, while the combined model demonstrated balanced predictive accuracy in both datasets (R2=0.461, RMSE=0.120 for transcriptome; R2=0.604, RMSE=0.111 for proteome). This study presents the first proteotranscriptomic framework for SF2 prediction in NSCLC, highlighting the complementary value of integrating transcriptomic and proteomic data. The identified concurrent biomarkers capture both transcriptional regulation and functional protein activity, offering mechanistic insights and translational potential.
Related papers
- STRAND: Sequence-Conditioned Transport for Single-Cell Perturbations [31.08466183513241]
STRAND is a generative model that predicts single-cell responses by conditioning on regulatory DNA sequence.<n>Representing perturbations by sequence, rather than by a fixed set of gene identifiers, supports zero-shot inference at loci not seen during training.<n>We evaluate STRAND on CRISPR perturbation datasets in K562, Jurkat, and RPE1 cells.
arXiv Detail & Related papers (2026-02-10T00:57:38Z) - Modeling Dabrafenib Response Using Multi-Omics Modality Fusion and Protein Network Embeddings Based on Graph Convolutional Networks [0.0]
Cancer cell response to targeted therapy arises from complex molecular interactions, making single omics insufficient for accurate prediction.<n>This study develops a model to predict Dabrafenib sensitivity by integrating multiple omics layers (genomics, transcriptomics, epigenomics, metabolomics) with protein network embeddings generated using Graph Convolutional Networks (GCN)<n>Results show that attention guided multi omics fusion combined with GCN improves drug response prediction and reveals complementary molecular determinants of Dabrafenib sensitivity.
arXiv Detail & Related papers (2025-12-13T02:00:56Z) - S$^2$Drug: Bridging Protein Sequence and 3D Structure in Contrastive Representation Learning for Virtual Screening [72.89086338778098]
We propose a two-stage framework for protein-ligand contrastive representation learning.<n>In the first stage, we perform protein sequence pretraining on ChemBL using an ESM2-based backbone.<n>In the second stage, we fine-tune on PDBBind by fusing sequence and structure information through a residue-level gating module.<n>This auxiliary task guides the model to accurately localize binding residues within the protein sequence and capture their 3D spatial arrangement.
arXiv Detail & Related papers (2025-11-10T11:57:47Z) - Transforming Multi-Omics Integration with GANs: Applications in Alzheimer's and Cancer [0.0]
We introduce Omics-GAN, a Generative Adversarial Network (GAN)-based framework designed to generate high-quality synthetic multi-omics profiles.<n>We demonstrated Omics-GAN on three omics types (mRNA, methylation and DNA methylation) using the ROSMAP cohort for Alzheimer's disease.<n>A support vector machine (SVM) with repeated 5-fold cross-validation improved prediction accuracy compared to original omics profiles.<n>Boxplot analyses confirmed that synthetic data preserved statistical distributions while reducing noise and outliers.
arXiv Detail & Related papers (2025-10-22T05:55:49Z) - scPPDM: A Diffusion Model for Single-Cell Drug-Response Prediction [44.96130504547205]
This paper introduces scPPDM, the first diffusion-based framework for single-cell drug-response prediction from scRNA-seq data.<n>scPPDM couples two condition channels, pre-perturbation state and drug with dose, in a unified latent space via non-concatenative GD-Attn.
arXiv Detail & Related papers (2025-10-08T16:17:39Z) - LGE-Guided Cross-Modality Contrastive Learning for Gadolinium-Free Cardiomyopathy Screening in Cine CMR [51.11296719862485]
We propose a Contrastive Learning and Cross-Modal alignment framework for gadolinium-free cardiomyopathy screening using cine CMR sequences.<n>By aligning the latent spaces of cine CMR and Late Gadolinium Enhancement (LGE) sequences, our model encodes fibrosis-specific pathology into cine CMR embeddings.
arXiv Detail & Related papers (2025-08-23T07:21:23Z) - Interpretable Graph Kolmogorov-Arnold Networks for Multi-Cancer Classification and Biomarker Identification using Multi-Omics Data [36.92842246372894]
Multi-Omics Graph Kolmogorov-Arnold Network (MOGKAN) is a deep learning framework that utilizes messenger-RNA, micro-RNA sequences, and DNA methylation samples.<n>By integrating multi-omics data with graph-based deep learning, our proposed approach demonstrates robust predictive performance and interpretability.
arXiv Detail & Related papers (2025-03-29T02:14:05Z) - scMamba: A Pre-Trained Model for Single-Nucleus RNA Sequencing Analysis in Neurodegenerative Disorders [43.24785083027205]
scMamba is a pre-trained model designed to improve the quality and utility of snRNA-seq analysis.<n>Inspired by the recent Mamba model, scMamba introduces a novel architecture that incorporates a linear adapter layer, gene embeddings, and bidirectional Mamba blocks.<n>We demonstrate that scMamba outperforms benchmark methods in various downstream tasks, including cell type annotation, doublet detection, imputation, and the identification of differentially expressed genes.
arXiv Detail & Related papers (2025-02-12T11:48:22Z) - Graph Structure Learning for Tumor Microenvironment with Cell Type Annotation from non-spatial scRNA-seq data [6.432270457083369]
We present a novel graph neural network (GNN) model that enhances cell type prediction and cell interaction analysis.<n>The proposed scGSL model demonstrated robust performance, achieving an average accuracy of 84.83%, precision of 86.23%, recall of 81.51%, and an F1 score of 80.92% across all datasets.
arXiv Detail & Related papers (2025-02-04T18:28:25Z) - Diffusion Model with Representation Alignment for Protein Inverse Folding [53.139837825588614]
Protein inverse folding is a fundamental problem in bioinformatics, aiming to recover the amino acid sequences from a given protein backbone structure.<n>We propose a novel method that leverages diffusion models with representation alignment (DMRA)<n>In experiments, we conduct extensive evaluations on the CATH4.2 dataset to demonstrate that DMRA outperforms leading methods.
arXiv Detail & Related papers (2024-12-12T15:47:59Z) - Integrating Protein Sequence and Expression Level to Analysis Molecular Characterization of Breast Cancer Subtypes [0.0]
This study aims to integrate protein sequence data with expression levels to improve the molecular characterization of breast cancer subtypes.<n>Using ProtGPT2, a language model specifically designed for protein sequences, we generated embeddings that capture the functional and structural properties of proteins.<n>These embeddings were integrated with protein expression levels to form enriched biological representations, which were analyzed using machine learning methods.
arXiv Detail & Related papers (2024-10-02T17:05:48Z) - hist2RNA: An efficient deep learning architecture to predict gene
expression from breast cancer histopathology images [11.822321981275232]
Deep learning algorithms can effectively extract morphological patterns in digital histopathology images to predict molecular phenotypes quickly and cost-effectively.
We propose a new, computationally efficient approach called hist2RNA inspired by bulk RNA-sequencing techniques to predict the expression of 138 genes.
arXiv Detail & Related papers (2023-04-10T10:54:32Z) - Learning to diagnose cirrhosis from radiological and histological labels
with joint self and weakly-supervised pretraining strategies [62.840338941861134]
We propose to leverage transfer learning from large datasets annotated by radiologists, to predict the histological score available on a small annex dataset.
We compare different pretraining methods, namely weakly-supervised and self-supervised ones, to improve the prediction of the cirrhosis.
This method outperforms the baseline classification of the METAVIR score, reaching an AUC of 0.84 and a balanced accuracy of 0.75.
arXiv Detail & Related papers (2023-02-16T17:06:23Z) - State-specific protein-ligand complex structure prediction with a
multi-scale deep generative model [68.28309982199902]
We present NeuralPLexer, a computational approach that can directly predict protein-ligand complex structures.
Our study suggests that a data-driven approach can capture the structural cooperativity between proteins and small molecules, showing promise in accelerating the design of enzymes, drug molecules, and beyond.
arXiv Detail & Related papers (2022-09-30T01:46:38Z) - Predicting Molecular Phenotypes with Single Cell RNA Sequencing Data: an
Assessment of Unsupervised Machine Learning Models [0.0]
This study is to evaluate unsupervised machine learning on classifying treatment-resistant phenotypes in heterogeneous tumors.
scRNAseq quantifies mRNA in cells and characterizes cell phenotypes.
clusters generated from this pipeline can be used to understand cancer cell behavior and malignant growth.
arXiv Detail & Related papers (2021-08-11T05:30:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.