Drug Discovery under Covariate Shift with Domain-Informed Prior
Distributions over Functions
- URL: http://arxiv.org/abs/2307.15073v1
- Date: Fri, 14 Jul 2023 05:01:10 GMT
- Title: Drug Discovery under Covariate Shift with Domain-Informed Prior
Distributions over Functions
- Authors: Leo Klarner, Tim G. J. Rudner, Michael Reutlinger, Torsten Schindler,
Garrett M. Morris, Charlotte Deane, Yee Whye Teh
- Abstract summary: Real-world drug discovery tasks are often characterized by a scarcity of labeled data and a significant range of data.
We present a principled way to encode explicit prior knowledge of the data-generating process into a prior distribution.
We demonstrate that using integrate Q-SAVI to contextualize prior knowledgelike chemical space into the modeling process affords substantial accuracy and calibration.
- Score: 30.305418761024143
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Accelerating the discovery of novel and more effective therapeutics is an
important pharmaceutical problem in which deep learning is playing an
increasingly significant role. However, real-world drug discovery tasks are
often characterized by a scarcity of labeled data and significant covariate
shift$\unicode{x2013}\unicode{x2013}$a setting that poses a challenge to
standard deep learning methods. In this paper, we present Q-SAVI, a
probabilistic model able to address these challenges by encoding explicit prior
knowledge of the data-generating process into a prior distribution over
functions, presenting researchers with a transparent and probabilistically
principled way to encode data-driven modeling preferences. Building on a novel,
gold-standard bioactivity dataset that facilitates a meaningful comparison of
models in an extrapolative regime, we explore different approaches to induce
data shift and construct a challenging evaluation setup. We then demonstrate
that using Q-SAVI to integrate contextualized prior knowledge of drug-like
chemical space into the modeling process affords substantial gains in
predictive accuracy and calibration, outperforming a broad range of
state-of-the-art self-supervised pre-training and domain adaptation techniques.
Related papers
- Towards Precision Healthcare: Robust Fusion of Time Series and Image Data [8.579651833717763]
We introduce a new method that uses two separate encoders, one for each type of data, allowing the model to understand complex patterns in both visual and time-based information.
We also deal with imbalanced datasets and use an uncertainty loss function, yielding improved results.
Our experiments show that our method is effective in improving multimodal deep learning for clinical applications.
arXiv Detail & Related papers (2024-05-24T11:18:13Z) - Seeing Unseen: Discover Novel Biomedical Concepts via
Geometry-Constrained Probabilistic Modeling [53.7117640028211]
We present a geometry-constrained probabilistic modeling treatment to resolve the identified issues.
We incorporate a suite of critical geometric properties to impose proper constraints on the layout of constructed embedding space.
A spectral graph-theoretic method is devised to estimate the number of potential novel classes.
arXiv Detail & Related papers (2024-03-02T00:56:05Z) - Meta Transfer of Self-Supervised Knowledge: Foundation Model in Action
for Post-Traumatic Epilepsy Prediction [0.6291443816903801]
We introduce a novel training strategy for our foundation model.
We demonstrate that the proposed strategy significantly improves task performance on small-scale clinical datasets.
Results further demonstrated the enhanced generalizability of our foundation model.
arXiv Detail & Related papers (2023-12-21T07:42:49Z) - Domain-invariant Clinical Representation Learning by Bridging Data
Distribution Shift across EMR Datasets [16.317118701435742]
An effective prognostic model is expected to assist doctors in making right diagnosis and designing personalized treatment plan.
In the early stage of a disease, limited data collection and clinical experiences, plus the concern out of privacy and ethics, may result in restricted data availability for reference.
This article introduces a domain-invariant representation learning method to build a transition model from source dataset to target dataset.
arXiv Detail & Related papers (2023-10-11T18:32:21Z) - MedDiffusion: Boosting Health Risk Prediction via Diffusion-based Data
Augmentation [58.93221876843639]
This paper introduces a novel, end-to-end diffusion-based risk prediction model, named MedDiffusion.
It enhances risk prediction performance by creating synthetic patient data during training to enlarge sample space.
It discerns hidden relationships between patient visits using a step-wise attention mechanism, enabling the model to automatically retain the most vital information for generating high-quality data.
arXiv Detail & Related papers (2023-10-04T01:36:30Z) - Amplifying Pathological Detection in EEG Signaling Pathways through
Cross-Dataset Transfer Learning [10.212217551908525]
We study the effectiveness of data and model scaling and cross-dataset knowledge transfer in a real-world pathology classification task.
We identify the challenges of possible negative transfer and emphasize the significance of some key components.
Our findings indicate a small and generic model (e.g. ShallowNet) performs well on a single dataset, however, a larger model (e.g. TCN) performs better on transfer and learning from a larger and diverse dataset.
arXiv Detail & Related papers (2023-09-19T20:09:15Z) - ArSDM: Colonoscopy Images Synthesis with Adaptive Refinement Semantic
Diffusion Models [69.9178140563928]
Colonoscopy analysis is essential for assisting clinical diagnosis and treatment.
The scarcity of annotated data limits the effectiveness and generalization of existing methods.
We propose an Adaptive Refinement Semantic Diffusion Model (ArSDM) to generate colonoscopy images that benefit the downstream tasks.
arXiv Detail & Related papers (2023-09-03T07:55:46Z) - Differentiable Agent-based Epidemiology [71.81552021144589]
We introduce GradABM: a scalable, differentiable design for agent-based modeling that is amenable to gradient-based learning with automatic differentiation.
GradABM can quickly simulate million-size populations in few seconds on commodity hardware, integrate with deep neural networks and ingest heterogeneous data sources.
arXiv Detail & Related papers (2022-07-20T07:32:02Z) - Towards modelling hazard factors in unstructured data spaces using
gradient-based latent interpolation [2.3867305921818573]
The application of deep learning in survival analysis (SA) gives the opportunity to utilize unstructured and high-dimensional data types uncommon in traditional survival methods.
This allows to advance methods in fields such as digital health, predictive maintenance and churn analysis.
We propose 1) a multi-task variational autoencoder (VAE) with survival objective, yielding survival-oriented embeddings, and 2) a novel method HazardWalk that allows to model hazard factors in the original data space.
arXiv Detail & Related papers (2021-10-21T17:46:03Z) - Accurate and Robust Feature Importance Estimation under Distribution
Shifts [49.58991359544005]
PRoFILE is a novel feature importance estimation method.
We show significant improvements over state-of-the-art approaches, both in terms of fidelity and robustness.
arXiv Detail & Related papers (2020-09-30T05:29:01Z) - Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype
Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients.
We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks.
Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.