Size doesn't matter: predicting physico- or biochemical properties based
on dozens of molecules
- URL: http://arxiv.org/abs/2107.10882v1
- Date: Thu, 22 Jul 2021 18:57:24 GMT
- Title: Size doesn't matter: predicting physico- or biochemical properties based
on dozens of molecules
- Authors: Kirill Karpov (1 and 2), Artem Mitrofanov (1 and 2), Vadim Korolev (1
and 2), Valery Tkachenko (2) ((1) Lomonosov Moscow State University,
Department of Chemistry, Leninskie gory, 1 bld. 3, Moscow, Russia, (2)
Science Data Software, LLC, 14909 Forest Landing Cir, Rockville, USA)
- Abstract summary: The paper shows a significant improvement in the performance of models for target properties with a lack of data.
The effects of the dataset composition on model quality and the applicability domain of the resulting models are also considered.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The use of machine learning in chemistry has become a common practice. At the
same time, despite the success of modern machine learning methods, the lack of
data limits their use. Using a transfer learning methodology can help solve
this problem. This methodology assumes that a model built on a sufficient
amount of data captures general features of the chemical compound structure on
which it was trained and that the further reuse of these features on a dataset
with a lack of data will greatly improve the quality of the new model. In this
paper, we develop this approach for small organic molecules, implementing
transfer learning with graph convolutional neural networks. The paper shows a
significant improvement in the performance of models for target properties with
a lack of data. The effects of the dataset composition on model quality and the
applicability domain of the resulting models are also considered.
Related papers
- Structure to Property: Chemical Element Embeddings and a Deep Learning
Approach for Accurate Prediction of Chemical Properties [0.0]
This paper introduces a new machine learning model based on deep learning techniques, such as a multilayer encoder and decoder architecture, for classification tasks.
We demonstrate the opportunities offered by our approach by applying it to various types of input data, including organic and inorganic compounds.
The models used in this work exhibit a high degree of predictive power, underscoring the progress that can be made with refined machine learning.
arXiv Detail & Related papers (2023-09-17T19:41:32Z) - Implicit Geometry and Interaction Embeddings Improve Few-Shot Molecular
Property Prediction [53.06671763877109]
We develop molecular embeddings that encode complex molecular characteristics to improve the performance of few-shot molecular property prediction.
Our approach leverages large amounts of synthetic data, namely the results of molecular docking calculations.
On multiple molecular property prediction benchmarks, training from the embedding space substantially improves Multi-Task, MAML, and Prototypical Network few-shot learning performance.
arXiv Detail & Related papers (2023-02-04T01:32:40Z) - Calibration and generalizability of probabilistic models on low-data
chemical datasets with DIONYSUS [0.0]
We perform an extensive study of the calibration and generalizability of probabilistic machine learning models on small chemical datasets.
We analyse the quality of their predictions and uncertainties in a variety of tasks (binary, regression) and datasets.
We offer practical insights into model and feature choice for modelling small chemical datasets, a common scenario in new chemical experiments.
arXiv Detail & Related papers (2022-12-03T08:19:06Z) - An Adversarial Active Sampling-based Data Augmentation Framework for
Manufacturable Chip Design [55.62660894625669]
Lithography modeling is a crucial problem in chip design to ensure a chip design mask is manufacturable.
Recent developments in machine learning have provided alternative solutions in replacing the time-consuming lithography simulations with deep neural networks.
We propose a litho-aware data augmentation framework to resolve the dilemma of limited data and improve the machine learning model performance.
arXiv Detail & Related papers (2022-10-27T20:53:39Z) - Advancing Reacting Flow Simulations with Data-Driven Models [50.9598607067535]
Key to effective use of machine learning tools in multi-physics problems is to couple them to physical and computer models.
The present chapter reviews some of the open opportunities for the application of data-driven reduced-order modeling of combustion systems.
arXiv Detail & Related papers (2022-09-05T16:48:34Z) - Model-agnostic multi-objective approach for the evolutionary discovery
of mathematical models [55.41644538483948]
In modern data science, it is more interesting to understand the properties of the model, which parts could be replaced to obtain better results.
We use multi-objective evolutionary optimization for composite data-driven model learning to obtain the algorithm's desired properties.
arXiv Detail & Related papers (2021-07-07T11:17:09Z) - Model-Based Deep Learning [155.063817656602]
Signal processing, communications, and control have traditionally relied on classical statistical modeling techniques.
Deep neural networks (DNNs) use generic architectures which learn to operate from data, and demonstrate excellent performance.
We are interested in hybrid techniques that combine principled mathematical models with data-driven systems to benefit from the advantages of both approaches.
arXiv Detail & Related papers (2020-12-15T16:29:49Z) - Exploring the potential of transfer learning for metamodels of
heterogeneous material deformation [0.0]
We show that transfer learning can be used to leverage both low-fidelity simulation data and simulation data.
We extend Mechanical MNIST, our open source benchmark dataset of heterogeneous material undergoing large deformation.
We show that transferring the knowledge stored in metamodels trained on these low-fidelity simulation results can vastly improve the performance of metamodels used to predict the results of high-fidelity simulations.
arXiv Detail & Related papers (2020-10-28T12:43:46Z) - Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype
Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients.
We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks.
Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z) - BERT Learns (and Teaches) Chemistry [5.653789128055942]
We propose the use of attention to study functional groups and other property-impacting molecular substructures from a data-driven perspective.
We then apply the representations of functional groups and atoms learned by the model to tackle problems of toxicity, solubility, drug-likeness, and accessibility.
arXiv Detail & Related papers (2020-07-11T00:23:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.