Generalization within in silico screening
- URL: http://arxiv.org/abs/2307.09379v2
- Date: Tue, 23 Jul 2024 16:37:22 GMT
- Title: Generalization within in silico screening
- Authors: Andreas Loukas, Pan Kessel, Vladimir Gligorijevic, Richard Bonneau,
- Abstract summary: In silico screening uses predictive models to select a batch of compounds with favorable properties from a library for experimental validation.
By extending learning theory, we show that the selectivity of the selection policy can significantly impact generalization.
We show that generalization can be markedly enhanced when considering a model's ability to predict the fraction of desired outcomes in a batch.
- Score: 19.58677466616286
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: In silico screening uses predictive models to select a batch of compounds with favorable properties from a library for experimental validation. Unlike conventional learning paradigms, success in this context is measured by the performance of the predictive model on the selected subset of compounds rather than the entire set of predictions. By extending learning theory, we show that the selectivity of the selection policy can significantly impact generalization, with a higher risk of errors occurring when exclusively selecting predicted positives and when targeting rare properties. Our analysis suggests a way to mitigate these challenges. We show that generalization can be markedly enhanced when considering a model's ability to predict the fraction of desired outcomes in a batch. This is promising, as the primary aim of screening is not necessarily to pinpoint the label of each compound individually, but rather to assemble a batch enriched for desirable compounds. Our theoretical insights are empirically validated across diverse tasks, architectures, and screening scenarios, underscoring their applicability.
Related papers
- Detecting and Identifying Selection Structure in Sequential Data [53.24493902162797]
We argue that the selective inclusion of data points based on latent objectives is common in practical situations, such as music sequences.
We show that selection structure is identifiable without any parametric assumptions or interventional experiments.
We also propose a provably correct algorithm to detect and identify selection structures as well as other types of dependencies.
arXiv Detail & Related papers (2024-06-29T20:56:34Z) - Improving Adaptive Conformal Prediction Using Self-Supervised Learning [72.2614468437919]
We train an auxiliary model with a self-supervised pretext task on top of an existing predictive model and use the self-supervised error as an additional feature to estimate nonconformity scores.
We empirically demonstrate the benefit of the additional information using both synthetic and real data on the efficiency (width), deficit, and excess of conformal prediction intervals.
arXiv Detail & Related papers (2023-02-23T18:57:14Z) - In Search of Insights, Not Magic Bullets: Towards Demystification of the
Model Selection Dilemma in Heterogeneous Treatment Effect Estimation [92.51773744318119]
This paper empirically investigates the strengths and weaknesses of different model selection criteria.
We highlight that there is a complex interplay between selection strategies, candidate estimators and the data used for comparing them.
arXiv Detail & Related papers (2023-02-06T16:55:37Z) - Selection by Prediction with Conformal p-values [7.917044695538599]
We study screening procedures that aim to select candidates whose unobserved outcomes exceed user-specified values.
We develop a method that wraps around any prediction model to produce a subset of candidates while controlling the proportion of falsely selected units.
arXiv Detail & Related papers (2022-10-04T06:34:49Z) - Low cost prediction of probability distributions of molecular properties
for early virtual screening [0.8702432681310399]
This article applies Hierarchical Correlation Reconstruction approach, previously applied in the analysis of demographic, financial and astronomical data.
The whole methodology constitutes therefore a great support for medicinal chemists, as it enable fast rejection of compounds with the lowest potential of desired physicochemical/ADMET characteristic.
arXiv Detail & Related papers (2022-07-21T13:29:26Z) - General Greedy De-bias Learning [163.65789778416172]
We propose a General Greedy De-bias learning framework (GGD), which greedily trains the biased models and the base model like gradient descent in functional space.
GGD can learn a more robust base model under the settings of both task-specific biased models with prior knowledge and self-ensemble biased model without prior knowledge.
arXiv Detail & Related papers (2021-12-20T14:47:32Z) - Selective Ensembles for Consistent Predictions [19.154189897847804]
inconsistency is undesirable in high-stakes contexts.
We show that this inconsistency extends beyond predictions to feature attributions.
We prove that selective ensembles achieve consistent predictions and feature attributions while maintaining low abstention rates.
arXiv Detail & Related papers (2021-11-16T05:03:56Z) - Test-time Collective Prediction [73.74982509510961]
Multiple parties in machine learning want to jointly make predictions on future test points.
Agents wish to benefit from the collective expertise of the full set of agents, but may not be willing to release their data or model parameters.
We explore a decentralized mechanism to make collective predictions at test time, leveraging each agent's pre-trained model.
arXiv Detail & Related papers (2021-06-22T18:29:58Z) - Characterizing Fairness Over the Set of Good Models Under Selective
Labels [69.64662540443162]
We develop a framework for characterizing predictive fairness properties over the set of models that deliver similar overall performance.
We provide tractable algorithms to compute the range of attainable group-level predictive disparities.
We extend our framework to address the empirically relevant challenge of selectively labelled data.
arXiv Detail & Related papers (2021-01-02T02:11:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.