Related papers: A novel RNA pseudouridine site prediction model using Utility Kernel and data-driven parameters

A novel RNA pseudouridine site prediction model using Utility Kernel and data-driven parameters

URL: http://arxiv.org/abs/2311.16132v1
Date: Thu, 2 Nov 2023 08:32:10 GMT
Title: A novel RNA pseudouridine site prediction model using Utility Kernel and data-driven parameters
Authors: Sourabh Patil, Archana Mathur, Raviprasad Aduri, Snehanshu Saha
Abstract summary: Pseudouridine is the most frequent modification in RNA. Existing models to predict the pseudouridine sites in a given RNA sequence mainly depend on user-defined features. We propose a Support Vector Machine (SVM) Kernel based on utility theory from Economics.
Score: 0.7373617024876725
License: http://creativecommons.org/licenses/by/4.0/
Abstract: RNA protein Interactions (RPIs) play an important role in biological systems. Recently, we have enumerated the RPIs at the residue level and have elucidated the minimum structural unit (MSU) in these interactions to be a stretch of five residues (Nucleotides/amino acids). Pseudouridine is the most frequent modification in RNA. The conversion of uridine to pseudouridine involves interactions between pseudouridine synthase and RNA. The existing models to predict the pseudouridine sites in a given RNA sequence mainly depend on user-defined features such as mono and dinucleotide composition/propensities of RNA sequences. Predicting pseudouridine sites is a non-linear classification problem with limited data points. Deep Learning models are efficient discriminators when the data set size is reasonably large and fail when there is a paucity of data ($<1000$ samples). To mitigate this problem, we propose a Support Vector Machine (SVM) Kernel based on utility theory from Economics, and using data-driven parameters (i.e. MSU) as features. For this purpose, we have used position-specific tri/quad/pentanucleotide composition/propensity (PSPC/PSPP) besides nucleotide and dineculeotide composition as features. SVMs are known to work well in small data regimes and kernels in SVM are designed to classify non-linear data. The proposed model outperforms the existing state-of-the-art models significantly (10%-15% on average).

Related papers

RNAMunin: A Deep Machine Learning Model for Non-coding RNA Discovery [0.0]
Non-coding RNAs (ncRNAs) are critical for regulating bacterial and archaeal physiology, stress response and metabolism.<n>This paper presents RNAMunin, a machine learning (ML) model that is capable of finding ncRNAs using genomic sequence alone.<n> RNAMunin is trained on Rfam sequences extracted from approximately 60 Gbp of long read metagenomes from 16 San Francisco Estuary samples.
arXiv Detail & Related papers (2025-07-16T06:33:50Z)
CircFormerMoE: An End-to-End Deep Learning Framework for Circular RNA Splice Site Detection and Pairing in Plant Genomes [0.0]
Circular RNAs (circRNAs) are important components of the non-coding RNA regulatory network.<n>We propose a deep learning framework named CircFormerMoE based on transformers and mixture-of experts for predicting circRNAs directly from plant genomic DNA.
arXiv Detail & Related papers (2025-07-11T12:43:17Z)
Character-level Tokenizations as Powerful Inductive Biases for RNA Foundational Models [0.0]
understanding and predicting RNA behavior is a challenge due to the complexity of RNA structures and interactions. Current RNA models have yet to match the performance observed in the protein domain. ChaRNABERT is able to reach state-of-the-art performance on several tasks in established benchmarks.
arXiv Detail & Related papers (2024-11-05T21:56:16Z)
BEACON: Benchmark for Comprehensive RNA Tasks and Language Models [60.02663015002029]
We introduce the first comprehensive RNA benchmark BEACON (textbfBEnchmtextbfArk for textbfCOmprehensive RtextbfNA Task and Language Models). First, BEACON comprises 13 distinct tasks derived from extensive previous work covering structural analysis, functional studies, and engineering applications. Second, we examine a range of models, including traditional approaches like CNNs, as well as advanced RNA foundation models based on language models, offering valuable insights into the task-specific performances of these models. Third, we investigate the vital RNA language model components
arXiv Detail & Related papers (2024-06-14T19:39:19Z)
Regressor-free Molecule Generation to Support Drug Response Prediction [83.25894107956735]
Conditional generation based on the target IC50 score can obtain a more effective sampling space. Regressor-free guidance combines a diffusion model's score estimation with a regression controller model's gradient based on number labels.
arXiv Detail & Related papers (2024-05-23T13:22:17Z)
scHyena: Foundation Model for Full-Length Single-Cell RNA-Seq Analysis in Brain [46.39828178736219]
We introduce scHyena, a foundation model designed to address these challenges and enhance the accuracy of scRNA-seq analysis in the brain. scHyena is equipped with a linear adaptor layer, the positional encoding via gene-embedding, and a bidirectional Hyena operator. This enables us to process full-length scRNA-seq data without losing any information from the raw data.
arXiv Detail & Related papers (2023-10-04T10:30:08Z)
HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide Resolution [76.97231739317259]
We present HyenaDNA, a genomic foundation model pretrained on the human reference genome with context lengths of up to 1 million tokens at the single nucleotide-level. On fine-tuned benchmarks from the Nucleotide Transformer, HyenaDNA reaches state-of-the-art (SotA) on 12 of 18 datasets using a model with orders of magnitude less parameters and pretraining data.
arXiv Detail & Related papers (2023-06-27T20:46:34Z)
Optirank: classification for RNA-Seq data with optimal ranking reference genes [0.0]
We propose a logistic regression model, optirank, which learns simultaneously the parameters of the model and the genes to use as a reference set in the ranking. We also consider real classification tasks, which present different kinds of distribution shifts between train and test data.
arXiv Detail & Related papers (2023-01-11T10:49:06Z)
Accurate RNA 3D structure prediction using a language model-based deep learning approach [50.193512039121984]
RhoFold+ is an RNA language model-based deep learning method that accurately predicts 3D structures of single-chain RNAs from sequences. RhoFold+ offers a fully automated end-to-end pipeline for RNA 3D structure prediction.
arXiv Detail & Related papers (2022-07-04T17:15:35Z)
MuCoMiD: A Multitask Convolutional Learning Framework for miRNA-Disease Association Prediction [0.4061135251278187]
We propose a novel multi-tasking convolution-based approach, which we refer to as MuCoMiD. MuCoMiD allows automatic feature extraction while incorporating knowledge from 4 heterogeneous biological information sources. We construct large-scale experiments on standard benchmark datasets as well as our proposed larger independent test sets and case studies. MuCoMiD shows an improvement of at least 5% in 5-fold CV evaluation on HMDDv2.0 and HMDDv3.0 datasets and at least 49% on larger independent test sets with unseen diseases and unseen diseases over state-of-the-art approaches.
arXiv Detail & Related papers (2021-08-08T10:01:46Z)
Rank-R FNN: A Tensor-Based Learning Model for High-Order Data Classification [69.26747803963907]
Rank-R Feedforward Neural Network (FNN) is a tensor-based nonlinear learning model that imposes Canonical/Polyadic decomposition on its parameters. First, it handles inputs as multilinear arrays, bypassing the need for vectorization, and can thus fully exploit the structural information along every data dimension. We establish the universal approximation and learnability properties of Rank-R FNN, and we validate its performance on real-world hyperspectral datasets.
arXiv Detail & Related papers (2021-04-11T16:37:32Z)
Approximate kNN Classification for Biomedical Data [1.1852406625172218]
Single-cell RNA-seq (scRNA-seq) is an emerging DNA sequencing technology with promising capabilities but significant computational challenges. We propose the utilization of approximate nearest neighbor search algorithms for the task of kNN classification in scRNA-seq data.
arXiv Detail & Related papers (2020-12-03T18:30:43Z)
A Systematic Approach to Featurization for Cancer Drug Sensitivity Predictions with Deep Learning [49.86828302591469]
We train >35,000 neural network models, sweeping over common featurization techniques. We found the RNA-seq to be highly redundant and informative even with subsets larger than 128 features.
arXiv Detail & Related papers (2020-04-30T20:42:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.