A semi-supervised learning framework for quantitative structure-activity
regression modelling
- URL: http://arxiv.org/abs/2001.01924v1
- Date: Tue, 7 Jan 2020 07:56:49 GMT
- Title: A semi-supervised learning framework for quantitative structure-activity
regression modelling
- Authors: Oliver P Watson, Isidro Cortes-Ciriano, James A Watson
- Abstract summary: We show that it is possible to make predictions which take into account the similarity of the testing compounds to those in the training data and adjust for the reporting selection bias.
We illustrate this approach using publicly available structure-activity data on a large set of compounds reported by GlaxoSmithKline.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Supervised learning models, also known as quantitative structure-activity
regression (QSAR) models, are increasingly used in assisting the process of
preclinical, small molecule drug discovery. The models are trained on data
consisting of a finite dimensional representation of molecular structures and
their corresponding target specific activities. These models can then be used
to predict the activity of previously unmeasured novel compounds. In this work
we address two problems related to this approach. The first is to estimate the
extent to which the quality of the model predictions degrades for compounds
very different from the compounds in the training data. The second is to adjust
for the screening dependent selection bias inherent in many training data sets.
In the most extreme cases, only compounds which pass an activity-dependent
screening are reported. By using a semi-supervised learning framework, we show
that it is possible to make predictions which take into account the similarity
of the testing compounds to those in the training data and adjust for the
reporting selection bias. We illustrate this approach using publicly available
structure-activity data on a large set of compounds reported by GlaxoSmithKline
(the Tres Cantos AntiMalarial Set) to inhibit in vitro P. falciparum growth.
Related papers
- SMILES-Mamba: Chemical Mamba Foundation Models for Drug ADMET Prediction [16.189335444981353]
Predicting the absorption, distribution, metabolism, excretion, and toxicity of small-molecule drugs is critical for ensuring safety and efficacy.
We propose a two-stage model that leverages both unlabeled and labeled data through a combination of self-supervised pretraining and fine-tuning strategies.
Our results demonstrate that SMILES-Mamba exhibits competitive performance across 22 ADMET datasets, achieving the highest score in 14 tasks.
arXiv Detail & Related papers (2024-08-11T04:53:12Z) - Seeing Unseen: Discover Novel Biomedical Concepts via
Geometry-Constrained Probabilistic Modeling [53.7117640028211]
We present a geometry-constrained probabilistic modeling treatment to resolve the identified issues.
We incorporate a suite of critical geometric properties to impose proper constraints on the layout of constructed embedding space.
A spectral graph-theoretic method is devised to estimate the number of potential novel classes.
arXiv Detail & Related papers (2024-03-02T00:56:05Z) - Measuring and Improving Attentiveness to Partial Inputs with Counterfactuals [91.59906995214209]
We propose a new evaluation method, Counterfactual Attentiveness Test (CAT)
CAT uses counterfactuals by replacing part of the input with its counterpart from a different example, expecting an attentive model to change its prediction.
We show that GPT3 becomes less attentive with an increased number of demonstrations, while its accuracy on the test data improves.
arXiv Detail & Related papers (2023-11-16T06:27:35Z) - Unlearning Spurious Correlations in Chest X-ray Classification [4.039245878626345]
We train a deep learning model using a Covid-19 chest X-ray dataset.
We show how this dataset can lead to spurious correlations due to unintended confounding regions.
XBL is a deep learning approach that goes beyond interpretability by utilizing model explanations to interactively unlearn spurious correlations.
arXiv Detail & Related papers (2023-08-02T12:59:10Z) - Stubborn Lexical Bias in Data and Models [50.79738900885665]
We use a new statistical method to examine whether spurious patterns in data appear in models trained on the data.
We apply an optimization approach to *reweight* the training data, reducing thousands of spurious correlations.
Surprisingly, though this method can successfully reduce lexical biases in the training data, we still find strong evidence of corresponding bias in the trained models.
arXiv Detail & Related papers (2023-06-03T20:12:27Z) - Drug Synergistic Combinations Predictions via Large-Scale Pre-Training
and Graph Structure Learning [82.93806087715507]
Drug combination therapy is a well-established strategy for disease treatment with better effectiveness and less safety degradation.
Deep learning models have emerged as an efficient way to discover synergistic combinations.
Our framework achieves state-of-the-art results in comparison with other deep learning-based methods.
arXiv Detail & Related papers (2023-01-14T15:07:43Z) - MolE: a molecular foundation model for drug discovery [0.2802437011072858]
MolE is a molecular foundation model that adapts the DeBERTa architecture to be used on molecular graphs.
We show that fine-tuning pretrained MolE achieves state-of-the-art results on 9 of the 22 ADMET tasks included in the Therapeutic Data Commons.
arXiv Detail & Related papers (2022-11-03T21:22:05Z) - Meaningful machine learning models and machine-learned pharmacophores
from fragment screening campaigns [0.0]
We derive machine learning models from over 50 fragment-screening campaigns.
We provide a physically interpretable and verifiable representation of what the ML model considers important for successful binding.
We find good agreement between the key molecular substructures proposed by the ML model and those assigned manually.
arXiv Detail & Related papers (2022-03-25T18:08:55Z) - Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype
Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients.
We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks.
Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z) - Ensemble Transfer Learning for the Prediction of Anti-Cancer Drug
Response [49.86828302591469]
In this paper, we apply transfer learning to the prediction of anti-cancer drug response.
We apply the classic transfer learning framework that trains a prediction model on the source dataset and refines it on the target dataset.
The ensemble transfer learning pipeline is implemented using LightGBM and two deep neural network (DNN) models with different architectures.
arXiv Detail & Related papers (2020-05-13T20:29:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.