GBDTSVM: Combined Support Vector Machine and Gradient Boosting Decision Tree Framework for efficient snoRNA-disease association prediction
- URL: http://arxiv.org/abs/2505.06534v1
- Date: Sat, 10 May 2025 06:46:29 GMT
- Title: GBDTSVM: Combined Support Vector Machine and Gradient Boosting Decision Tree Framework for efficient snoRNA-disease association prediction
- Authors: Ummay Maria Muna, Fahim Hafiz, Shanta Biswas, Riasat Azim,
- Abstract summary: This paper proposes a model called 'GBDTSVM', representing a novel and efficient machine learning approach for predicting snoRNA-disease associations.<n>'GBDTSVM' effectively extracts integrated snoRNA-disease feature representations utilizing GBDT and SVM.<n> Experimental evaluation of the GBDTSVM model demonstrated superior performance compared to state-of-the-art methods in the field.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Small nucleolar RNAs (snoRNAs) are increasingly recognized for their critical role in the pathogenesis and characterization of various human diseases. Consequently, the precise identification of snoRNA-disease associations (SDAs) is essential for the progression of diseases and the advancement of treatment strategies. However, conventional biological experimental approaches are costly, time-consuming, and resource-intensive; therefore, machine learning-based computational methods offer a promising solution to mitigate these limitations. This paper proposes a model called 'GBDTSVM', representing a novel and efficient machine learning approach for predicting snoRNA-disease associations by leveraging a Gradient Boosting Decision Tree (GBDT) and Support Vector Machine (SVM). 'GBDTSVM' effectively extracts integrated snoRNA-disease feature representations utilizing GBDT and SVM is subsequently utilized to classify and identify potential associations. Furthermore, the method enhances the accuracy of these predictions by incorporating Gaussian kernel profile similarity for both snoRNAs and diseases. Experimental evaluation of the GBDTSVM model demonstrated superior performance compared to state-of-the-art methods in the field, achieving an area under the receiver operating characteristic (AUROC) of 0.96 and an area under the precision-recall curve (AUPRC) of 0.95 on MDRF dataset. Moreover, our model shows superior performance on two more datasets named LSGT and PsnoD. Additionally, a case study on the predicted snoRNA-disease associations verified the top 10 predicted snoRNAs across nine prevalent diseases, further validating the efficacy of the GBDTSVM approach. These results underscore the model's potential as a robust tool for advancing snoRNA-related disease research. Source codes and datasets our proposed framework can be obtained from: https://github.com/mariamuna04/gbdtsvm
Related papers
- scMamba: A Pre-Trained Model for Single-Nucleus RNA Sequencing Analysis in Neurodegenerative Disorders [43.24785083027205]
scMamba is a pre-trained model designed to improve the quality and utility of snRNA-seq analysis.<n>Inspired by the recent Mamba model, scMamba introduces a novel architecture that incorporates a linear adapter layer, gene embeddings, and bidirectional Mamba blocks.<n>We demonstrate that scMamba outperforms benchmark methods in various downstream tasks, including cell type annotation, doublet detection, imputation, and the identification of differentially expressed genes.
arXiv Detail & Related papers (2025-02-12T11:48:22Z) - scBIT: Integrating Single-cell Transcriptomic Data into fMRI-based Prediction for Alzheimer's Disease Diagnosis [24.268703526039367]
scBIT is a novel method for enhancing Alzheimer's disease (AD) prediction by combining fMRI with single-nucleus RNA (snRNA)<n>It employs a sampling strategy to segment snRNA data into cell-type-specific gene networks and utilizes a self-explainable graph neural network to extract critical subgraphs.<n> Extensive experiments validate scBIT's effectiveness in revealing intricate brain region-gene associations.
arXiv Detail & Related papers (2025-02-04T18:37:46Z) - Improved Anomaly Detection through Conditional Latent Space VAE Ensembles [49.1574468325115]
Conditional Latent space Variational Autoencoder (CL-VAE) improved pre-processing for anomaly detection on data with known inlier classes and unknown outlier classes.
Model shows increased accuracy in anomaly detection, achieving an AUC of 97.4% on the MNIST dataset.
In addition, the CL-VAE shows increased benefits from ensembling, a more interpretable latent space, and an increased ability to learn patterns in complex data with limited model sizes.
arXiv Detail & Related papers (2024-10-16T07:48:53Z) - BEACON: Benchmark for Comprehensive RNA Tasks and Language Models [60.02663015002029]
We introduce the first comprehensive RNA benchmark BEACON (textbfBEnchmtextbfArk for textbfCOmprehensive RtextbfNA Task and Language Models).<n>First, BEACON comprises 13 distinct tasks derived from extensive previous work covering structural analysis, functional studies, and engineering applications.<n>Second, we examine a range of models, including traditional approaches like CNNs, as well as advanced RNA foundation models based on language models, offering valuable insights into the task-specific performances of these models.<n>Third, we investigate the vital RNA language model components
arXiv Detail & Related papers (2024-06-14T19:39:19Z) - LncRNA-disease association prediction method based on heterogeneous information completion and convolutional neural network [9.17998537192211]
The accuracy of lncRNA-disease associations (LDAs) is very important for the warning and treatment of diseases.
In this paper, a deep learning model based on a heterogeneous network and convolutional neural network (CNN) is proposed for lncRNA-disease association prediction, named HCNNLDA.
The experimental results show that the proposed model has better performance than that of several latest prediction models.
arXiv Detail & Related papers (2024-06-02T06:11:27Z) - Heterogeneous network and graph attention auto-encoder for LncRNA-disease association prediction [11.73181429989457]
In this work, multiple sources of biomedical data are fully utilized to construct characteristics of lncRNAs and diseases.
A novel deep learning model based on graph attention automatic encoder is proposed, called HGATELDA.
The HGATELDA model achieves an impressive AUC value of 0.9692 when evaluated using a 5-fold cross-validation.
arXiv Detail & Related papers (2024-05-03T02:15:05Z) - Machine Learning Modeling Of SiRNA Structure-Potency Relationship With
Applications Against Sars-Cov-2 Spike Gene [0.0]
Drug discovery process is lengthy and costly, taking nearly a decade to bring a new drug to the market.
Biotechnology, computational methods, and machine learning algorithms have the potential to revolutionize drug discovery, speeding up the process and improving patient outcomes.
The COVID-19 pandemic has further accelerated and deepened the recognition of the potential of these techniques, especially in the areas of drug repurposing and efficacy predictions.
arXiv Detail & Related papers (2024-01-18T23:00:34Z) - Brain Imaging-to-Graph Generation using Adversarial Hierarchical Diffusion Models for MCI Causality Analysis [44.45598796591008]
Brain imaging-to-graph generation (BIGG) framework is proposed to map functional magnetic resonance imaging (fMRI) into effective connectivity for mild cognitive impairment analysis.
The hierarchical transformers in the generator are designed to estimate the noise at multiple scales.
Evaluations of the ADNI dataset demonstrate the feasibility and efficacy of the proposed model.
arXiv Detail & Related papers (2023-05-18T06:54:56Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - CovidDeep: SARS-CoV-2/COVID-19 Test Based on Wearable Medical Sensors
and Efficient Neural Networks [51.589769497681175]
The novel coronavirus (SARS-CoV-2) has led to a pandemic.
The current testing regime based on Reverse Transcription-Polymerase Chain Reaction for SARS-CoV-2 has been unable to keep up with testing demands.
We propose a framework called CovidDeep that combines efficient DNNs with commercially available WMSs for pervasive testing of the virus.
arXiv Detail & Related papers (2020-07-20T21:47:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.