BASiNETEntropy: an alignment-free method for classification of
biological sequences through complex networks and entropy maximization
- URL: http://arxiv.org/abs/2203.15635v1
- Date: Thu, 24 Mar 2022 14:19:43 GMT
- Title: BASiNETEntropy: an alignment-free method for classification of
biological sequences through complex networks and entropy maximization
- Authors: Murilo Montanini Breve, Matheus Henrique Pimenta-Zanon and Fabr\'icio
Martins Lopes
- Abstract summary: This work presents a new method for the classification of biological sequences through complex networks and entropy.
The maximum entropy principle is proposed to identify the most informative edges about the RNA class, generating a filtered complex network.
The proposed method was evaluated in the classification of different RNA classes from 13 species.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The discovery of nucleic acids and the structure of DNA have brought
considerable advances in the understanding of life. The development of
next-generation sequencing technologies has led to a large-scale generation of
data, for which computational methods have become essential for analysis and
knowledge discovery. In particular, RNAs have received much attention because
of the diversity of their functionalities in the organism and the discoveries
of different classes with different functions in many biological processes.
Therefore, the correct identification of RNA sequences is increasingly
important to provide relevant information to understand the functioning of
organisms. This work addresses this context by presenting a new method for the
classification of biological sequences through complex networks and entropy
maximization. The maximum entropy principle is proposed to identify the most
informative edges about the RNA class, generating a filtered complex network.
The proposed method was evaluated in the classification of different RNA
classes from 13 species. The proposed method was compared to PLEK, CPC2 and
BASiNET methods, outperforming all compared methods. BASiNETEntropy classified
all RNA sequences with high accuracy and low standard deviation in results,
showing assertiveness and robustness. The proposed method is implemented in an
open source in R language and is freely available at
https://cran.r-project.org/web/packages/BASiNETEntropy.
Related papers
- Character-level Tokenizations as Powerful Inductive Biases for RNA Foundational Models [0.0]
understanding and predicting RNA behavior is a challenge due to the complexity of RNA structures and interactions.
Current RNA models have yet to match the performance observed in the protein domain.
ChaRNABERT is able to reach state-of-the-art performance on several tasks in established benchmarks.
arXiv Detail & Related papers (2024-11-05T21:56:16Z) - RNACG: A Universal RNA Sequence Conditional Generation model based on Flow-Matching [0.0]
We develop a universal RNA sequence generation model based on flow matching, namely RNACG.
RNACG can accommodate various conditional inputs and is portable, enabling users to customize the encoding network for conditional inputs.
RNACG exhibits extensive applicability in sequence generation and property prediction tasks.
arXiv Detail & Related papers (2024-07-29T09:46:46Z) - BEACON: Benchmark for Comprehensive RNA Tasks and Language Models [60.02663015002029]
We introduce the first comprehensive RNA benchmark BEACON (textbfBEnchmtextbfArk for textbfCOmprehensive RtextbfNA Task and Language Models).
First, BEACON comprises 13 distinct tasks derived from extensive previous work covering structural analysis, functional studies, and engineering applications.
Second, we examine a range of models, including traditional approaches like CNNs, as well as advanced RNA foundation models based on language models, offering valuable insights into the task-specific performances of these models.
Third, we investigate the vital RNA language model components
arXiv Detail & Related papers (2024-06-14T19:39:19Z) - RNA Secondary Structure Prediction Using Transformer-Based Deep Learning Models [13.781096813376145]
The Human Genome Project has led to an exponential increase in data related to the sequence, structure, and function of biomolecules.
This paper discusses the fundamental concepts of RNA, RNA secondary structure, and its prediction.
The application of machine learning technologies in predicting the structure of biological macromolecules is explored.
arXiv Detail & Related papers (2024-04-14T08:36:14Z) - PhyloGFN: Phylogenetic inference with generative flow networks [57.104166650526416]
We introduce the framework of generative flow networks (GFlowNets) to tackle two core problems in phylogenetics: parsimony-based and phylogenetic inference.
Because GFlowNets are well-suited for sampling complex structures, they are a natural choice for exploring and sampling from the multimodal posterior distribution over tree topologies.
We demonstrate that our amortized posterior sampler, PhyloGFN, produces diverse and high-quality evolutionary hypotheses on real benchmark datasets.
arXiv Detail & Related papers (2023-10-12T23:46:08Z) - Knowledge from Large-Scale Protein Contact Prediction Models Can Be
Transferred to the Data-Scarce RNA Contact Prediction Task [40.051834115537474]
We find that a protein-coevolution Transformer-based deep neural network can be transferred to the RNA contact prediction task.
Experiments confirm that RNA contact prediction through transfer learning is greatly improved.
Our findings indicate that the learned structural patterns of proteins can be transferred to RNAs, opening up potential new avenues for research.
arXiv Detail & Related papers (2023-02-13T06:00:56Z) - RDesign: Hierarchical Data-efficient Representation Learning for
Tertiary Structure-based RNA Design [65.41144149958208]
This study aims to systematically construct a data-driven RNA design pipeline.
We crafted a benchmark dataset and designed a comprehensive structural modeling approach to represent the complex RNA tertiary structure.
We incorporated extracted secondary structures with base pairs as prior knowledge to facilitate the RNA design process.
arXiv Detail & Related papers (2023-01-25T17:19:49Z) - Diversifying Design of Nucleic Acid Aptamers Using Unsupervised Machine
Learning [54.247560894146105]
Inverse design of short single-stranded RNA and DNA sequences (aptamers) is the task of finding sequences that satisfy a set of desired criteria.
We propose to use an unsupervised machine learning model known as the Potts model to discover new, useful sequences with controllable sequence diversity.
arXiv Detail & Related papers (2022-08-10T13:30:58Z) - Improving RNA Secondary Structure Design using Deep Reinforcement
Learning [69.63971634605797]
We propose a new benchmark of applying reinforcement learning to RNA sequence design, in which the objective function is defined to be the free energy in the sequence's secondary structure.
We show results of the ablation analysis that we do for these algorithms, as well as graphs indicating the algorithm's performance across batches.
arXiv Detail & Related papers (2021-11-05T02:54:06Z) - Approximate kNN Classification for Biomedical Data [1.1852406625172218]
Single-cell RNA-seq (scRNA-seq) is an emerging DNA sequencing technology with promising capabilities but significant computational challenges.
We propose the utilization of approximate nearest neighbor search algorithms for the task of kNN classification in scRNA-seq data.
arXiv Detail & Related papers (2020-12-03T18:30:43Z) - A Systematic Approach to Featurization for Cancer Drug Sensitivity
Predictions with Deep Learning [49.86828302591469]
We train >35,000 neural network models, sweeping over common featurization techniques.
We found the RNA-seq to be highly redundant and informative even with subsets larger than 128 features.
arXiv Detail & Related papers (2020-04-30T20:42:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.