DNA-GCN: Graph convolutional networks for predicting DNA-protein binding
- URL: http://arxiv.org/abs/2106.01836v1
- Date: Wed, 2 Jun 2021 07:36:11 GMT
- Title: DNA-GCN: Graph convolutional networks for predicting DNA-protein binding
- Authors: Yuhang Guo, Xiao Luo, Liang Chen and Minghua Deng
- Abstract summary: We build a sequence k-mer graph and learn DNA Graph Convolutional Network (DNA-GCN) for the whole dataset.
DNA-GCN is with a one-hot representation for all nodes, and it then jointly learns the embeddings for both k-mers and sequences.
We evaluate our model on 50 datasets from ENCODE.
- Score: 4.1600531290054
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Predicting DNA-protein binding is an important and classic problem in
bioinformatics. Convolutional neural networks have outperformed conventional
methods in modeling the sequence specificity of DNA-protein binding. However,
none of the studies has utilized graph convolutional networks for motif
inference. In this work, we propose to use graph convolutional networks for
motif inference. We build a sequence k-mer graph for the whole dataset based on
k-mer co-occurrence and k-mer sequence relationship and then learn DNA Graph
Convolutional Network (DNA-GCN) for the whole dataset. Our DNA-GCN is
initialized with a one-hot representation for all nodes, and it then jointly
learns the embeddings for both k-mers and sequences, as supervised by the known
labels of sequences. We evaluate our model on 50 datasets from ENCODE. DNA-GCN
shows its competitive performance compared with the baseline model. Besides, we
analyze our model and design several different architectures to help fit
different datasets.
Related papers
- Scalable Graph Compressed Convolutions [68.85227170390864]
We propose a differentiable method that applies permutations to calibrate input graphs for Euclidean convolution.
Based on the graph calibration, we propose the Compressed Convolution Network (CoCN) for hierarchical graph representation learning.
arXiv Detail & Related papers (2024-07-26T03:14:13Z) - GraSSRep: Graph-Based Self-Supervised Learning for Repeat Detection in
Metagenomic Assembly [24.55141372357102]
Repetitive DNA (repeats) poses significant challenges for accurate and efficient genome assembly and sequence alignment.
GraSSRep is a self-supervised learning framework to classify DNA sequences into repetitive and non-repetitive categories.
GraSSRep combines sequencing features with pre-defined and learned graph features to achieve state-of-the-art performance in repeat detection.
arXiv Detail & Related papers (2024-02-14T18:26:58Z) - HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide
Resolution [76.97231739317259]
We present HyenaDNA, a genomic foundation model pretrained on the human reference genome with context lengths of up to 1 million tokens at the single nucleotide-level.
On fine-tuned benchmarks from the Nucleotide Transformer, HyenaDNA reaches state-of-the-art (SotA) on 12 of 18 datasets using a model with orders of magnitude less parameters and pretraining data.
arXiv Detail & Related papers (2023-06-27T20:46:34Z) - Seq-HGNN: Learning Sequential Node Representation on Heterogeneous Graph [57.2953563124339]
We propose a novel heterogeneous graph neural network with sequential node representation, namely Seq-HGNN.
We conduct extensive experiments on four widely used datasets from Heterogeneous Graph Benchmark (HGB) and Open Graph Benchmark (OGB)
arXiv Detail & Related papers (2023-05-18T07:27:18Z) - HAC-Net: A Hybrid Attention-Based Convolutional Neural Network for
Highly Accurate Protein-Ligand Binding Affinity Prediction [0.0]
We present a novel deep learning architecture consisting of a 3-dimensional convolutional neural network and two graph convolutional networks.
HAC-Net obtains state-of-the-art results on the PDBbind v.2016 core set.
We envision that this model can be extended to a broad range of supervised learning problems related to structure-based biomolecular property prediction.
arXiv Detail & Related papers (2022-12-23T16:14:53Z) - Learning to Untangle Genome Assembly with Graph Convolutional Networks [17.227634756670835]
We introduce a new learning framework to train a graph convolutional network to resolve assembly graphs by finding a correct path through them.
Experimental results show that a model, trained on simulated graphs generated solely from a single chromosome, is able to remarkably resolve all other chromosomes.
arXiv Detail & Related papers (2022-06-01T04:14:25Z) - Neural Graph Matching for Pre-training Graph Neural Networks [72.32801428070749]
Graph neural networks (GNNs) have been shown powerful capacity at modeling structural data.
We present a novel Graph Matching based GNN Pre-Training framework, called GMPT.
The proposed method can be applied to fully self-supervised pre-training and coarse-grained supervised pre-training.
arXiv Detail & Related papers (2022-03-03T09:53:53Z) - Simplicial Convolutional Neural Networks [36.078200422283835]
Recently, signal processing and neural networks have been extended to process and learn from data on graphs.
We propose a simplicial convolutional neural network (SCNN) architecture to learn from data defined on simplices.
arXiv Detail & Related papers (2021-10-06T08:52:55Z) - CatGCN: Graph Convolutional Networks with Categorical Node Features [99.555850712725]
CatGCN is tailored for graph learning when the node features are categorical.
We train CatGCN in an end-to-end fashion and demonstrate it on semi-supervised node classification.
arXiv Detail & Related papers (2020-09-11T09:25:17Z) - Convolutional Kernel Networks for Graph-Structured Data [37.13712126432493]
We introduce a family of multilayer graph kernels and establish new links between graph convolutional neural networks and kernel methods.
Our approach generalizes convolutional kernel networks to graph-structured data, by representing graphs as a sequence of kernel feature maps.
Our model can also be trained end-to-end on large-scale data, leading to new types of graph convolutional neural networks.
arXiv Detail & Related papers (2020-03-11T09:44:03Z) - Infinitely Wide Graph Convolutional Networks: Semi-supervised Learning
via Gaussian Processes [144.6048446370369]
Graph convolutional neural networks(GCNs) have recently demonstrated promising results on graph-based semi-supervised classification.
We propose a GP regression model via GCNs(GPGC) for graph-based semi-supervised learning.
We conduct extensive experiments to evaluate GPGC and demonstrate that it outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2020-02-26T10:02:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.