Related papers: Enhancing Uncertainty Quantification in Drug Discovery with Censored Regression Labels

Enhancing Uncertainty Quantification in Drug Discovery with Censored Regression Labels

URL: http://arxiv.org/abs/2409.04313v1
Date: Fri, 6 Sep 2024 14:38:47 GMT
Title: Enhancing Uncertainty Quantification in Drug Discovery with Censored Regression Labels
Authors: Emma Svensson, Hannah Rosa Friesacher, Susanne Winiwarter, Lewis Mervin, Adam Arany, Ola Engkvist,
Abstract summary: We adapt ensemble-based, Bayesian, and Gaussian models with tools to learn from censored labels. Our results demonstrate that despite the partial information available in censored labels, they are essential to accurately and reliably model the real pharmaceutical setting.
Score: 1.9354018523009415
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In the early stages of drug discovery, decisions regarding which experiments to pursue can be influenced by computational models. These decisions are critical due to the time-consuming and expensive nature of the experiments. Therefore, it is becoming essential to accurately quantify the uncertainty in machine learning predictions, such that resources can be used optimally and trust in the models improves. While computational methods for drug discovery often suffer from limited data and sparse experimental observations, additional information can exist in the form of censored labels that provide thresholds rather than precise values of observations. However, the standard approaches that quantify uncertainty in machine learning cannot fully utilize censored labels. In this work, we adapt ensemble-based, Bayesian, and Gaussian models with tools to learn from censored labels by using the Tobit model from survival analysis. Our results demonstrate that despite the partial information available in censored labels, they are essential to accurately and reliably model the real pharmaceutical setting.

Related papers

Beyond Perfect Scores: Proof-by-Contradiction for Trustworthy Machine Learning [0.0]
It is often unclear whether a model relies on true clinical cues or on spurious correlations in the data.<n>This paper introduces a simple yet broadly applicable trustworthiness test grounded in proof-by-contradiction.<n>Our approach trains and tests on spurious labels carefully permuted based on a potential outcomes framework.
arXiv Detail & Related papers (2026-01-10T22:08:14Z)
Informative missingness and its implications in semi-supervised learning [2.5794915063815664]
Semi-supervised learning (SSL) constructs classifiers using both labelled and unlabelled data.<n>This defines an incomplete-data problem, which statistically can be formulated within the likelihood framework for finite mixture models.<n> Modelling such informative missingness offers a coherent statistical framework that unifies likelihood-based inference with the behaviour of empirical SSL methods.
arXiv Detail & Related papers (2025-12-04T02:26:56Z)
Assessing the robustness of heterogeneous treatment effects in survival analysis under informative censoring [50.164756034797136]
Dropout is common in clinical studies, with up to half of patients leaving early due to side effects or other reasons.<n>When dropout is informative, it introduces censoring bias, because of which treatment effect estimates are also biased.<n>We propose an assumption-lean framework to assess the robustness of conditional average treatment effect estimates in survival analysis when facing censoring bias.
arXiv Detail & Related papers (2025-10-15T10:51:17Z)
Clinical Uncertainty Impacts Machine Learning Evaluations [40.773483049446426]
We argue that machine-learning evaluations should explicitly account for annotation uncertainty using probabilistic metrics that directly operate on distributions.<n>We urge the community to release raw annotations for datasets and to adopt uncertainty-aware evaluation so that performance estimates may better reflect clinical data.
arXiv Detail & Related papers (2025-09-26T11:56:58Z)
Robust Molecular Property Prediction via Densifying Scarce Labeled Data [51.55434084913129]
In drug discovery, compounds most critical for advancing research often lie beyond the training set.<n>We propose a novel meta-learning-based approach that leverages unlabeled data to interpolate between in-distribution (ID) and out-of-distribution (OOD) data.<n>We demonstrate significant performance gains on challenging real-world datasets.
arXiv Detail & Related papers (2025-06-13T15:27:40Z)
Evaluation of uncertainty estimations for Gaussian process regression based machine learning interatomic potentials [0.0]
Uncertainty estimations for machine learning interatomic potentials are crucial to quantify the additional model error they introduce. We consider GPR models with Coulomb and SOAP representations as inputs to predict potential energy surfaces and excitation energies of molecules. We evaluate, how the GPR variance and ensemble-based uncertainties relate to the error and whether model performance improves by selecting the most uncertain samples from a fixed configuration space.
arXiv Detail & Related papers (2024-10-27T10:06:09Z)
Unlearnable Examples Detection via Iterative Filtering [84.59070204221366]
Deep neural networks are proven to be vulnerable to data poisoning attacks. It is quite beneficial and challenging to detect poisoned samples from a mixed dataset. We propose an Iterative Filtering approach for UEs identification.
arXiv Detail & Related papers (2024-08-15T13:26:13Z)
Achieving Well-Informed Decision-Making in Drug Discovery: A Comprehensive Calibration Study using Neural Network-Based Structure-Activity Models [4.619907534483781]
computational models that predict drug-target interactions are valuable tools to accelerate the development of new therapeutic agents. However, such models can be poorly calibrated, which results in unreliable uncertainty estimates. We show that combining post hoc calibration method with well-performing uncertainty quantification approaches can boost model accuracy and calibration.
arXiv Detail & Related papers (2024-07-19T10:29:00Z)
Automated Labeling of German Chest X-Ray Radiology Reports using Deep Learning [50.591267188664666]
We propose a deep learning-based CheXpert label prediction model, pre-trained on reports labeled by a rule-based German CheXpert model. Our results demonstrate the effectiveness of our approach, which significantly outperformed the rule-based model on all three tasks.
arXiv Detail & Related papers (2023-06-09T16:08:35Z)
InstructBio: A Large-scale Semi-supervised Learning Paradigm for Biochemical Problems [38.57333125315448]
InstructMol is a semi-supervised learning algorithm to take better advantage of unlabeled examples. InstructBio substantially improves the generalization ability of molecular models.
arXiv Detail & Related papers (2023-04-08T04:19:22Z)
Towards Reliable Medical Image Segmentation by utilizing Evidential Calibrated Uncertainty [52.03490691733464]
We introduce DEviS, an easily implementable foundational model that seamlessly integrates into various medical image segmentation networks. By leveraging subjective logic theory, we explicitly model probability and uncertainty for the problem of medical image segmentation. DeviS incorporates an uncertainty-aware filtering module, which utilizes the metric of uncertainty-calibrated error to filter reliable data.
arXiv Detail & Related papers (2023-01-01T05:02:46Z)
Theoretical characterization of uncertainty in high-dimensional linear classification [24.073221004661427]
We show that uncertainty for learning from limited number of samples of high-dimensional input data and labels can be obtained by the approximate message passing algorithm. We discuss how over-confidence can be mitigated by appropriately regularising, and show that cross-validating with respect to the loss leads to better calibration than with the 0/1 error.
arXiv Detail & Related papers (2022-02-07T15:32:07Z)
Taming Overconfident Prediction on Unlabeled Data from Hindsight [50.9088560433925]
Minimizing prediction uncertainty on unlabeled data is a key factor to achieve good performance in semi-supervised learning. This paper proposes a dual mechanism, named ADaptive Sharpening (ADS), which first applies a soft-threshold to adaptively mask out determinate and negligible predictions. ADS significantly improves the state-of-the-art SSL methods by making it a plug-in.
arXiv Detail & Related papers (2021-12-15T15:17:02Z)
Evaluating State-of-the-Art Classification Models Against Bayes Optimality [106.50867011164584]
We show that we can compute the exact Bayes error of generative models learned using normalizing flows. We use our approach to conduct a thorough investigation of state-of-the-art classification models.
arXiv Detail & Related papers (2021-06-07T06:21:20Z)
Calibration of prediction rules for life-time outcomes using prognostic Cox regression survival models and multiple imputations to account for missing predictor data with cross-validatory assessment [0.0]
Methods are described to combine imputation with predictive calibration in survival modeling subject to censoring. Prediction-averaging appears to have superior statistical properties, especially smaller predictive variation, as opposed to a direct application of Rubin's rules.
arXiv Detail & Related papers (2021-05-04T20:10:12Z)
Leveraging Uncertainty from Deep Learning for Trustworthy Materials Discovery Workflows [16.53952506314226]
We show that by leveraging predictive uncertainty, a user can determine the required training data set size necessary to achieve a certain classification accuracy. We also propose uncertainty guided decision referral to detect and refrain from making decisions on confusing samples.
arXiv Detail & Related papers (2020-12-02T19:34:16Z)
Unlabelled Data Improves Bayesian Uncertainty Calibration under Covariate Shift [100.52588638477862]
We develop an approximate Bayesian inference scheme based on posterior regularisation. We demonstrate the utility of our method in the context of transferring prognostic models of prostate cancer across globally diverse populations.
arXiv Detail & Related papers (2020-06-26T13:50:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.