Adaptive Invariance for Molecule Property Prediction
- URL: http://arxiv.org/abs/2005.03004v1
- Date: Tue, 5 May 2020 19:47:20 GMT
- Title: Adaptive Invariance for Molecule Property Prediction
- Authors: Wengong Jin, Regina Barzilay, Tommi Jaakkola
- Abstract summary: We introduce a novel approach to learn predictors that can generalize or extrapolate beyond the heterogeneous data.
Our method builds on and extends recently proposed invariant risk minimization.
Our predictor outperforms state-of-the-art transfer learning methods by significant margin.
- Score: 38.637412590671865
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Effective property prediction methods can help accelerate the search for
COVID-19 antivirals either through accurate in-silico screens or by effectively
guiding on-going at-scale experimental efforts. However, existing prediction
tools have limited ability to accommodate scarce or fragmented training data
currently available. In this paper, we introduce a novel approach to learn
predictors that can generalize or extrapolate beyond the heterogeneous data.
Our method builds on and extends recently proposed invariant risk minimization,
adaptively forcing the predictor to avoid nuisance variation. We achieve this
by continually exercising and manipulating latent representations of molecules
to highlight undesirable variation to the predictor. To test the method we use
a combination of three data sources: SARS-CoV-2 antiviral screening data,
molecular fragments that bind to SARS-CoV-2 main protease and large screening
data for SARS-CoV-1. Our predictor outperforms state-of-the-art transfer
learning methods by significant margin. We also report the top 20 predictions
of our model on Broad drug repurposing hub.
Related papers
- Permutation invariant multi-output Gaussian Processes for drug combination prediction in cancer [2.1145050293719745]
Dose-response prediction in cancer is an active application field in machine learning.
The goal is to develop accurate predictive models that can be used to guide experimental design or inform treatment decisions.
arXiv Detail & Related papers (2024-06-28T18:28:38Z) - MedDiffusion: Boosting Health Risk Prediction via Diffusion-based Data
Augmentation [58.93221876843639]
This paper introduces a novel, end-to-end diffusion-based risk prediction model, named MedDiffusion.
It enhances risk prediction performance by creating synthetic patient data during training to enlarge sample space.
It discerns hidden relationships between patient visits using a step-wise attention mechanism, enabling the model to automatically retain the most vital information for generating high-quality data.
arXiv Detail & Related papers (2023-10-04T01:36:30Z) - Objective-Agnostic Enhancement of Molecule Properties via Multi-Stage
VAE [1.3597551064547502]
Variational autoencoder (VAE) is a popular method for drug discovery and various architectures and pipelines have been proposed to improve its performance.
VAE approaches are known to suffer from poor manifold recovery when the data lie on a low-dimensional manifold embedded in a higher dimensional ambient space.
In this paper, we explore applying a multi-stage VAE approach, that can improve manifold recovery on a synthetic dataset, to the field of drug discovery.
arXiv Detail & Related papers (2023-08-24T20:22:22Z) - ALMERIA: Boosting pairwise molecular contrasts with scalable methods [0.0]
ALMERIA is a tool for estimating compound similarities and activity prediction based on pairwise molecular contrasts.
It has been implemented using scalable software and methods to exploit large volumes of data.
Experiments show state-of-the-art performance for molecular activity prediction.
arXiv Detail & Related papers (2023-04-28T16:27:06Z) - Improving the Adversarial Robustness of NLP Models by Information
Bottleneck [112.44039792098579]
Non-robust features can be easily manipulated by adversaries to fool NLP models.
In this study, we explore the feasibility of capturing task-specific robust features, while eliminating the non-robust ones by using the information bottleneck theory.
We show that the models trained with our information bottleneck-based method are able to achieve a significant improvement in robust accuracy.
arXiv Detail & Related papers (2022-06-11T12:12:20Z) - Taming Overconfident Prediction on Unlabeled Data from Hindsight [50.9088560433925]
Minimizing prediction uncertainty on unlabeled data is a key factor to achieve good performance in semi-supervised learning.
This paper proposes a dual mechanism, named ADaptive Sharpening (ADS), which first applies a soft-threshold to adaptively mask out determinate and negligible predictions.
ADS significantly improves the state-of-the-art SSL methods by making it a plug-in.
arXiv Detail & Related papers (2021-12-15T15:17:02Z) - Predicting the Binding of SARS-CoV-2 Peptides to the Major
Histocompatibility Complex with Recurrent Neural Networks [0.40040974874482094]
We adapt and extend USMPep, a proposed, conceptually simple prediction algorithm based on recurrent neural networks.
We evaluate the performance on a recently released SARS-CoV-2 dataset with binding stability measurements.
arXiv Detail & Related papers (2021-04-16T17:16:35Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - STELAR: Spatio-temporal Tensor Factorization with Latent Epidemiological
Regularization [76.57716281104938]
We develop a tensor method to predict the evolution of epidemic trends for many regions simultaneously.
STELAR enables long-term prediction by incorporating latent temporal regularization through a system of discrete-time difference equations.
We conduct experiments using both county- and state-level COVID-19 data and show that our model can identify interesting latent patterns of the epidemic.
arXiv Detail & Related papers (2020-12-08T21:21:47Z) - Deep Learning Models for Early Detection and Prediction of the spread of
Novel Coronavirus (COVID-19) [4.213555705835109]
SARS-CoV2 is continuing to spread globally and has become a pandemic.
There is an urgent need to develop machine learning techniques to predict the spread of COVID-19.
arXiv Detail & Related papers (2020-07-29T10:14:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.