InForecaster: Forecasting Influenza Hemagglutinin Mutations Through the
Lens of Anomaly Detection
- URL: http://arxiv.org/abs/2210.13709v1
- Date: Tue, 25 Oct 2022 02:08:09 GMT
- Title: InForecaster: Forecasting Influenza Hemagglutinin Mutations Through the
Lens of Anomaly Detection
- Authors: Ali Garjani, Atoosa Malemir Chegini, Mohammadreza Salehi, Alireza
Tabibzadeh, Parastoo Yousefi, Mohammad Hossein Razizadeh, Moein Esghaei,
Maryam Esghaei, and Mohammad Hossein Rohban
- Abstract summary: anomaly detection (AD) is a well-established field in Machine Learning (ML)
We propose to tackle this challenge through anomaly detection (AD)
We conduct a large number of experiments on four publicly available datasets.
- Score: 3.5213888068272197
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The influenza virus hemagglutinin is an important part of the virus
attachment to the host cells. The hemagglutinin proteins are one of the genetic
regions of the virus with a high potential for mutations. Due to the importance
of predicting mutations in producing effective and low-cost vaccines, solutions
that attempt to approach this problem have recently gained a significant
attention. A historical record of mutations have been used to train predictive
models in such solutions. However, the imbalance between mutations and the
preserved proteins is a big challenge for the development of such models that
needs to be addressed. Here, we propose to tackle this challenge through
anomaly detection (AD). AD is a well-established field in Machine Learning (ML)
that tries to distinguish unseen anomalies from the normal patterns using only
normal training samples. By considering mutations as the anomalous behavior, we
could benefit existing rich solutions in this field that have emerged recently.
Such methods also fit the problem setup of extreme imbalance between the number
of unmutated vs. mutated training samples. Motivated by this formulation, our
method tries to find a compact representation for unmutated samples while
forcing anomalies to be separated from the normal ones. This helps the model to
learn a shared unique representation between normal training samples as much as
possible, which improves the discernibility and detectability of mutated
samples from the unmutated ones at the test time. We conduct a large number of
experiments on four publicly available datasets, consisting of 3 different
hemagglutinin protein datasets, and one SARS-CoV-2 dataset, and show the
effectiveness of our method through different standard criteria.
Related papers
- Predicting loss-of-function impact of genetic mutations: a machine
learning approach [0.0]
This paper aims to train machine learning models on the attributes of a genetic mutation to predict LoFtool scores.
These attributes included, but were not limited to, the position of a mutation on a chromosome, changes in amino acids, and changes in codons caused by the mutation.
Models were evaluated using five-fold cross-validated averages of r-squared, mean squared error, root mean squared error, mean absolute error, and explained variance.
arXiv Detail & Related papers (2024-01-26T19:27:38Z) - Invariant Anomaly Detection under Distribution Shifts: A Causal
Perspective [6.845698872290768]
Anomaly detection (AD) is the machine learning task of identifying highly discrepant abnormal samples.
Under the constraints of a distribution shift, the assumption that training samples and test samples are drawn from the same distribution breaks down.
We attempt to increase the resilience of anomaly detection models to different kinds of distribution shifts.
arXiv Detail & Related papers (2023-12-21T23:20:47Z) - Efficiently Predicting Protein Stability Changes Upon Single-point
Mutation with Large Language Models [51.57843608615827]
The ability to precisely predict protein thermostability is pivotal for various subfields and applications in biochemistry.
We introduce an ESM-assisted efficient approach that integrates protein sequence and structural features to predict the thermostability changes in protein upon single-point mutations.
arXiv Detail & Related papers (2023-12-07T03:25:49Z) - Accurate and Definite Mutational Effect Prediction with Lightweight
Equivariant Graph Neural Networks [2.381587712372268]
This research introduces a lightweight graph representation learning scheme that efficiently analyzes the microenvironment of wild-type proteins.
Our solution offers a wide range of benefits that make it an ideal choice for the community.
arXiv Detail & Related papers (2023-04-13T09:51:49Z) - Diversity-Measurable Anomaly Detection [106.07413438216416]
We propose Diversity-Measurable Anomaly Detection (DMAD) framework to enhance reconstruction diversity.
PDM essentially decouples deformation from embedding and makes the final anomaly score more reliable.
arXiv Detail & Related papers (2023-03-09T05:52:42Z) - Saliency Grafting: Innocuous Attribution-Guided Mixup with Calibrated
Label Mixing [104.630875328668]
Mixup scheme suggests mixing a pair of samples to create an augmented training sample.
We present a novel, yet simple Mixup-variant that captures the best of both worlds.
arXiv Detail & Related papers (2021-12-16T11:27:48Z) - PhyloTransformer: A Discriminative Model for Mutation Prediction Based
on a Multi-head Self-attention Mechanism [10.468453827172477]
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has caused an ongoing pandemic infecting 219 million people as of 10/19/21, with a 3.6% mortality rate.
Here we developed PhyloTransformer, a Transformer-based discriminative model that engages a multi-head self-attention mechanism to model genetic mutations that may lead to viral reproductive advantage.
arXiv Detail & Related papers (2021-11-03T01:30:57Z) - MutFormer: A context-dependent transformer-based model to predict
pathogenic missense mutations [5.153619184788929]
missense mutations account for approximately half of the known variants responsible for human inherited diseases.
Recent advances in deep learning show that transformer models are particularly powerful at modeling sequences.
We introduce MutFormer, a transformer-based model for prediction of pathogenic missense mutations.
arXiv Detail & Related papers (2021-10-27T20:17:35Z) - GANs with Variational Entropy Regularizers: Applications in Mitigating
the Mode-Collapse Issue [95.23775347605923]
Building on the success of deep learning, Generative Adversarial Networks (GANs) provide a modern approach to learn a probability distribution from observed samples.
GANs often suffer from the mode collapse issue where the generator fails to capture all existing modes of the input distribution.
We take an information-theoretic approach and maximize a variational lower bound on the entropy of the generated samples to increase their diversity.
arXiv Detail & Related papers (2020-09-24T19:34:37Z) - Tracking disease outbreaks from sparse data with Bayesian inference [55.82986443159948]
The COVID-19 pandemic provides new motivation for estimating the empirical rate of transmission during an outbreak.
Standard methods struggle to accommodate the partial observability and sparse data common at finer scales.
We propose a Bayesian framework which accommodates partial observability in a principled manner.
arXiv Detail & Related papers (2020-09-12T20:37:33Z) - Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype
Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients.
We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks.
Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.