Comparing Machine Learning Algorithms with or without Feature Extraction
for DNA Classification
- URL: http://arxiv.org/abs/2011.00485v1
- Date: Sun, 1 Nov 2020 12:04:54 GMT
- Title: Comparing Machine Learning Algorithms with or without Feature Extraction
for DNA Classification
- Authors: Xiangxie Zhang, Ben Beinke, Berlian Al Kindhi and Marco Wiering
- Abstract summary: Three state-of-the-art algorithms, namely Convolutional Neural Networks, Deep Neural Networks, and N-gram Probabilistic Models, are used for the task of DNA classification.
We introduce a novel feature extraction method based on the Levenshtein distance and randomly generated DNA sub-sequences.
Four different data sets, each concerning viral diseases such as Covid-19, AIDS, Influenza, and Hepatitis C, are used for evaluating the different approaches.
- Score: 0.7742297876120561
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The classification of DNA sequences is a key research area in bioinformatics
as it enables researchers to conduct genomic analysis and detect possible
diseases. In this paper, three state-of-the-art algorithms, namely
Convolutional Neural Networks, Deep Neural Networks, and N-gram Probabilistic
Models, are used for the task of DNA classification. Furthermore, we introduce
a novel feature extraction method based on the Levenshtein distance and
randomly generated DNA sub-sequences to compute information-rich features from
the DNA sequences. We also use an existing feature extraction method based on
3-grams to represent amino acids and combine both feature extraction methods
with a multitude of machine learning algorithms. Four different data sets, each
concerning viral diseases such as Covid-19, AIDS, Influenza, and Hepatitis C,
are used for evaluating the different approaches. The results of the
experiments show that all methods obtain high accuracies on the different DNA
datasets. Furthermore, the domain-specific 3-gram feature extraction method
leads in general to the best results in the experiments, while the newly
proposed technique outperforms all other methods on the smallest Covid-19
dataset
Related papers
- Weighted Diversified Sampling for Efficient Data-Driven Single-Cell Gene-Gene Interaction Discovery [56.622854875204645]
We present an innovative approach utilizing data-driven computational tools, leveraging an advanced Transformer model, to unearth gene-gene interactions.
A novel weighted diversified sampling algorithm computes the diversity score of each data sample in just two passes of the dataset.
arXiv Detail & Related papers (2024-10-21T03:35:23Z) - DNA Sequence Classification with Compressors [0.0]
Our study introduces a novel adaptation of Jiang et al.'s compressor-based, parameter-free classification method, specifically tailored for DNA sequence analysis.
Not only does this method align with the current state-of-the-art in terms of accuracy, but it also offers a more resource-efficient alternative to traditional machine learning methods.
arXiv Detail & Related papers (2024-01-25T09:17:19Z) - Single-Cell Deep Clustering Method Assisted by Exogenous Gene
Information: A Novel Approach to Identifying Cell Types [50.55583697209676]
We develop an attention-enhanced graph autoencoder, which is designed to efficiently capture the topological features between cells.
During the clustering process, we integrated both sets of information and reconstructed the features of both cells and genes to generate a discriminative representation.
This research offers enhanced insights into the characteristics and distribution of cells, thereby laying the groundwork for early diagnosis and treatment of diseases.
arXiv Detail & Related papers (2023-11-28T09:14:55Z) - MuSe-GNN: Learning Unified Gene Representation From Multimodal
Biological Graph Data [22.938437500266847]
We introduce a novel model called Multimodal Similarity Learning Graph Neural Network.
It combines Multimodal Machine Learning and Deep Graph Neural Networks to learn gene representations from single-cell sequencing and spatial transcriptomic data.
Our model efficiently produces unified gene representations for the analysis of gene functions, tissue functions, diseases, and species evolution.
arXiv Detail & Related papers (2023-09-29T13:33:53Z) - DDeMON: Ontology-based function prediction by Deep Learning from Dynamic
Multiplex Networks [0.7349727826230864]
The goal of this work is to explore how the fusion of systems' level information with temporal dynamics of gene expression can be used to predict novel gene functions.
We propose DDeMON, an approach for scalable, systems-level inference of function annotation using time-dependent multiscale biological information.
arXiv Detail & Related papers (2023-02-08T06:53:02Z) - RL-MD: A Novel Reinforcement Learning Approach for DNA Motif Discovery [25.47916517236255]
We present RL-MD, a novel reinforcement learning based approach for DNA motif discovery task.
RL-MD takes unlabelled data as input, employs a relative information-based method to evaluate each proposed motif, and utilizes these continuous evaluation results as the reward.
Experiments show that RL-MD can identify high-quality motifs in real-world data.
arXiv Detail & Related papers (2022-09-30T02:07:37Z) - BASiNETEntropy: an alignment-free method for classification of
biological sequences through complex networks and entropy maximization [0.0]
This work presents a new method for the classification of biological sequences through complex networks and entropy.
The maximum entropy principle is proposed to identify the most informative edges about the RNA class, generating a filtered complex network.
The proposed method was evaluated in the classification of different RNA classes from 13 species.
arXiv Detail & Related papers (2022-03-24T14:19:43Z) - Deep neural networks approach to microbial colony detection -- a
comparative analysis [52.77024349608834]
This study investigates the performance of three deep learning approaches for object detection on the AGAR dataset.
The achieved results may serve as a benchmark for future experiments.
arXiv Detail & Related papers (2021-08-23T12:06:00Z) - Random Features for the Neural Tangent Kernel [57.132634274795066]
We propose an efficient feature map construction of the Neural Tangent Kernel (NTK) of fully-connected ReLU network.
We show that dimension of the resulting features is much smaller than other baseline feature map constructions to achieve comparable error bounds both in theory and practice.
arXiv Detail & Related papers (2021-04-03T09:08:12Z) - TSGCNet: Discriminative Geometric Feature Learning with Two-Stream
GraphConvolutional Network for 3D Dental Model Segmentation [141.2690520327948]
We propose a two-stream graph convolutional network (TSGCNet) to learn multi-view information from different geometric attributes.
We evaluate our proposed TSGCNet on a real-patient dataset of dental models acquired by 3D intraoral scanners.
arXiv Detail & Related papers (2020-12-26T08:02:56Z) - Deep Representational Similarity Learning for analyzing neural
signatures in task-based fMRI dataset [81.02949933048332]
This paper develops Deep Representational Similarity Learning (DRSL), a deep extension of Representational Similarity Analysis (RSA)
DRSL is appropriate for analyzing similarities between various cognitive tasks in fMRI datasets with a large number of subjects.
arXiv Detail & Related papers (2020-09-28T18:30:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.