RL-MD: A Novel Reinforcement Learning Approach for DNA Motif Discovery
- URL: http://arxiv.org/abs/2209.15181v1
- Date: Fri, 30 Sep 2022 02:07:37 GMT
- Title: RL-MD: A Novel Reinforcement Learning Approach for DNA Motif Discovery
- Authors: Wen Wang, Jianzong Wang, Shijing Si, Zhangcheng Huang, Jing Xiao
- Abstract summary: We present RL-MD, a novel reinforcement learning based approach for DNA motif discovery task.
RL-MD takes unlabelled data as input, employs a relative information-based method to evaluate each proposed motif, and utilizes these continuous evaluation results as the reward.
Experiments show that RL-MD can identify high-quality motifs in real-world data.
- Score: 25.47916517236255
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The extraction of sequence patterns from a collection of functionally linked
unlabeled DNA sequences is known as DNA motif discovery, and it is a key task
in computational biology. Several deep learning-based techniques have recently
been introduced to address this issue. However, these algorithms can not be
used in real-world situations because of the need for labeled data. Here, we
presented RL-MD, a novel reinforcement learning based approach for DNA motif
discovery task. RL-MD takes unlabelled data as input, employs a relative
information-based method to evaluate each proposed motif, and utilizes these
continuous evaluation results as the reward. The experiments show that RL-MD
can identify high-quality motifs in real-world data.
Related papers
- KinDEL: DNA-Encoded Library Dataset for Kinase Inhibitors [2.0179908661487986]
We present KinDEL, one of the first large, publicly available DEL datasets on two kinases.
We benchmark different machine learning techniques to develop predictive models for hit identification.
We provide biophysical assay data, both on- and off-DNA, to validate our models on a smaller subset of molecules.
arXiv Detail & Related papers (2024-10-11T16:03:58Z) - Unlocking Potential Binders: Multimodal Pretraining DEL-Fusion for Denoising DNA-Encoded Libraries [51.72836644350993]
Multimodal Pretraining DEL-Fusion model (MPDF)
We develop pretraining tasks applying contrastive objectives between different compound representations and their text descriptions.
We propose a novel DEL-fusion framework that amalgamates compound information at the atomic, submolecular, and molecular levels.
arXiv Detail & Related papers (2024-09-07T17:32:21Z) - Dy-mer: An Explainable DNA Sequence Representation Scheme using Sparse Recovery [6.733319363951907]
textbfDy-mer is an explainable and robust representation scheme based on sparse recovery.
It achieves state-of-the-art performance in DNA promoter classification, yielding a remarkable textbf13% increase in accuracy.
arXiv Detail & Related papers (2024-07-06T15:08:31Z) - DNA Sequence Classification with Compressors [0.0]
Our study introduces a novel adaptation of Jiang et al.'s compressor-based, parameter-free classification method, specifically tailored for DNA sequence analysis.
Not only does this method align with the current state-of-the-art in terms of accuracy, but it also offers a more resource-efficient alternative to traditional machine learning methods.
arXiv Detail & Related papers (2024-01-25T09:17:19Z) - From Artificially Real to Real: Leveraging Pseudo Data from Large
Language Models for Low-Resource Molecule Discovery [35.5507452011217]
Cross-modal techniques for molecule discovery frequently encounter the issue of data scarcity, hampering their performance and application.
We introduce a retrieval-based prompting strategy to construct high-quality pseudo data, then explore the optimal method to effectively leverage this pseudo data.
Experiments show that using pseudo data for domain adaptation outperforms all existing methods, while also requiring a smaller model scale, reduced data size and lower training cost.
arXiv Detail & Related papers (2023-09-11T02:35:36Z) - A Survey of Meta-Reinforcement Learning [69.76165430793571]
We cast the development of better RL algorithms as a machine learning problem itself in a process called meta-RL.
We discuss how, at a high level, meta-RL research can be clustered based on the presence of a task distribution and the learning budget available for each individual task.
We conclude by presenting the open problems on the path to making meta-RL part of the standard toolbox for a deep RL practitioner.
arXiv Detail & Related papers (2023-01-19T12:01:41Z) - Combining Feature and Instance Attribution to Detect Artifacts [62.63504976810927]
We propose methods to facilitate identification of training data artifacts.
We show that this proposed training-feature attribution approach can be used to uncover artifacts in training data.
We execute a small user study to evaluate whether these methods are useful to NLP researchers in practice.
arXiv Detail & Related papers (2021-07-01T09:26:13Z) - RL-DARTS: Differentiable Architecture Search for Reinforcement Learning [62.95469460505922]
We introduce RL-DARTS, one of the first applications of Differentiable Architecture Search (DARTS) in reinforcement learning (RL)
By replacing the image encoder with a DARTS supernet, our search method is sample-efficient, requires minimal extra compute resources, and is also compatible with off-policy and on-policy RL algorithms, needing only minor changes in preexisting code.
We show that the supernet gradually learns better cells, leading to alternative architectures which can be highly competitive against manually designed policies, but also verify previous design choices for RL policies.
arXiv Detail & Related papers (2021-06-04T03:08:43Z) - Comparing Machine Learning Algorithms with or without Feature Extraction
for DNA Classification [0.7742297876120561]
Three state-of-the-art algorithms, namely Convolutional Neural Networks, Deep Neural Networks, and N-gram Probabilistic Models, are used for the task of DNA classification.
We introduce a novel feature extraction method based on the Levenshtein distance and randomly generated DNA sub-sequences.
Four different data sets, each concerning viral diseases such as Covid-19, AIDS, Influenza, and Hepatitis C, are used for evaluating the different approaches.
arXiv Detail & Related papers (2020-11-01T12:04:54Z) - Deep Representational Similarity Learning for analyzing neural
signatures in task-based fMRI dataset [81.02949933048332]
This paper develops Deep Representational Similarity Learning (DRSL), a deep extension of Representational Similarity Analysis (RSA)
DRSL is appropriate for analyzing similarities between various cognitive tasks in fMRI datasets with a large number of subjects.
arXiv Detail & Related papers (2020-09-28T18:30:14Z) - A Systematic Approach to Featurization for Cancer Drug Sensitivity
Predictions with Deep Learning [49.86828302591469]
We train >35,000 neural network models, sweeping over common featurization techniques.
We found the RNA-seq to be highly redundant and informative even with subsets larger than 128 features.
arXiv Detail & Related papers (2020-04-30T20:42:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.