Graph Coloring via Neural Networks for Haplotype Assembly and Viral
Quasispecies Reconstruction
- URL: http://arxiv.org/abs/2210.12158v1
- Date: Fri, 21 Oct 2022 12:53:09 GMT
- Title: Graph Coloring via Neural Networks for Haplotype Assembly and Viral
Quasispecies Reconstruction
- Authors: Hansheng Xue, Vaibhav Rajan, Yu Lin
- Abstract summary: We develop a new method, called NeurHap, that combines graph representation learning with optimization.
Our experiments demonstrate substantially better performance of NeurHap in real and synthetic datasets compared to competing approaches.
- Score: 8.828330486848753
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Understanding genetic variation, e.g., through mutations, in organisms is
crucial to unravel their effects on the environment and human health. A
fundamental characterization can be obtained by solving the haplotype assembly
problem, which yields the variation across multiple copies of chromosomes.
Variations among fast evolving viruses that lead to different strains (called
quasispecies) are also deciphered with similar approaches. In both these cases,
high-throughput sequencing technologies that provide oversampled mixtures of
large noisy fragments (reads) of genomes, are used to infer constituent
components (haplotypes or quasispecies). The problem is harder for polyploid
species where there are more than two copies of chromosomes. State-of-the-art
neural approaches to solve this NP-hard problem do not adequately model
relations among the reads that are important for deconvolving the input signal.
We address this problem by developing a new method, called NeurHap, that
combines graph representation learning with combinatorial optimization. Our
experiments demonstrate substantially better performance of NeurHap in real and
synthetic datasets compared to competing approaches.
Related papers
- CSGDN: Contrastive Signed Graph Diffusion Network for Predicting Crop Gene-phenotype Associations [6.5678927417916455]
We propose a Contrastive Signed Graph Diffusion Network, CSGDN, to learn robust node representations with fewer training samples to achieve higher link prediction accuracy.
We conduct experiments to validate the performance of CSGDN on three crop datasets: Gossypium hirsutum, Brassica napus, and Triticum turgidum.
arXiv Detail & Related papers (2024-10-10T01:01:10Z) - Predicting Genetic Mutation from Whole Slide Images via Biomedical-Linguistic Knowledge Enhanced Multi-label Classification [119.13058298388101]
We develop a Biological-knowledge enhanced PathGenomic multi-label Transformer to improve genetic mutation prediction performances.
BPGT first establishes a novel gene encoder that constructs gene priors by two carefully designed modules.
BPGT then designs a label decoder that finally performs genetic mutation prediction by two tailored modules.
arXiv Detail & Related papers (2024-06-05T06:42:27Z) - Learning to Predict Mutation Effects of Protein-Protein Interactions by Microenvironment-aware Hierarchical Prompt Learning [78.38442423223832]
We develop a novel codebook pre-training task, namely masked microenvironment modeling.
We demonstrate superior performance and training efficiency over state-of-the-art pre-training-based methods in mutation effect prediction.
arXiv Detail & Related papers (2024-05-16T03:53:21Z) - Generation is better than Modification: Combating High Class Homophily Variance in Graph Anomaly Detection [51.11833609431406]
Homophily distribution differences between different classes are significantly greater than those in homophilic and heterophilic graphs.
We introduce a new metric called Class Homophily Variance, which quantitatively describes this phenomenon.
To mitigate its impact, we propose a novel GNN model named Homophily Edge Generation Graph Neural Network (HedGe)
arXiv Detail & Related papers (2024-03-15T14:26:53Z) - Heterogeneous Graph Neural Networks using Self-supervised Reciprocally
Contrastive Learning [102.9138736545956]
Heterogeneous graph neural network (HGNN) is a very popular technique for the modeling and analysis of heterogeneous graphs.
We develop for the first time a novel and robust heterogeneous graph contrastive learning approach, namely HGCL, which introduces two views on respective guidance of node attributes and graph topologies.
In this new approach, we adopt distinct but most suitable attribute and topology fusion mechanisms in the two views, which are conducive to mining relevant information in attributes and topologies separately.
arXiv Detail & Related papers (2022-04-30T12:57:02Z) - SemanticCAP: Chromatin Accessibility Prediction Enhanced by Features
Learning from a Language Model [3.0643865202019698]
We propose a new solution named SemanticCAP to identify accessible regions of the genome.
It introduces a gene language model which models the context of gene sequences, thus being able to provide an effective representation of gene sequences.
Compared with other systems under public benchmarks, our model proved to have better performance.
arXiv Detail & Related papers (2022-04-05T11:47:58Z) - VEGN: Variant Effect Prediction with Graph Neural Networks [19.59965282985234]
We propose VEGN, which models variant effect prediction using a graph neural network (GNN) that operates on a heterogeneous graph with genes and variants.
The graph is created by assigning variants to genes and connecting genes with an gene-gene interaction network.
VeGN improves the performance of existing state-of-the-art models.
arXiv Detail & Related papers (2021-06-25T13:51:46Z) - Mycorrhiza: Genotype Assignment usingPhylogenetic Networks [2.286041284499166]
We introduce Mycorrhiza, a machine learning approach for the genotype assignment problem.
Our algorithm makes use of phylogenetic networks to engineer features that encode the evolutionary relationships among samples.
Mycorrhiza yields particularly significant gains on datasets with a large average fixation index (FST) or deviation from the Hardy-Weinberg equilibrium.
arXiv Detail & Related papers (2020-10-14T02:36:27Z) - A Cross-Level Information Transmission Network for Predicting Phenotype
from New Genotype: Application to Cancer Precision Medicine [37.442717660492384]
We propose a novel Cross-LEvel Information Transmission network (CLEIT) framework.
Inspired by domain adaptation, CLEIT first learns the latent representation of high-level domain then uses it as ground-truth embedding.
We demonstrate the effectiveness and performance boost of CLEIT in predicting anti-cancer drug sensitivity from somatic mutations.
arXiv Detail & Related papers (2020-10-09T22:01:00Z) - Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype
Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients.
We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks.
Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z) - Data Augmentation for Histopathological Images Based on
Gaussian-Laplacian Pyramid Blending [59.91656519028334]
Data imbalance is a major problem that affects several machine learning (ML) algorithms.
In this paper, we propose a novel approach capable of not only augmenting HI dataset but also distributing the inter-patient variability.
Experimental results on the BreakHis dataset have shown promising gains vis-a-vis the majority of DA techniques presented in the literature.
arXiv Detail & Related papers (2020-01-31T22:02:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.