Related papers: Size doesn't matter: predicting physico- or biochemical properties based on dozens of molecules

Size doesn't matter: predicting physico- or biochemical properties based on dozens of molecules

URL: http://arxiv.org/abs/2107.10882v1
Date: Thu, 22 Jul 2021 18:57:24 GMT
Title: Size doesn't matter: predicting physico- or biochemical properties based on dozens of molecules
Authors: Kirill Karpov (1 and 2), Artem Mitrofanov (1 and 2), Vadim Korolev (1 and 2), Valery Tkachenko (2) ((1) Lomonosov Moscow State University, Department of Chemistry, Leninskie gory, 1 bld. 3, Moscow, Russia, (2) Science Data Software, LLC, 14909 Forest Landing Cir, Rockville, USA)
Abstract summary: The paper shows a significant improvement in the performance of models for target properties with a lack of data. The effects of the dataset composition on model quality and the applicability domain of the resulting models are also considered.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The use of machine learning in chemistry has become a common practice. At the same time, despite the success of modern machine learning methods, the lack of data limits their use. Using a transfer learning methodology can help solve this problem. This methodology assumes that a model built on a sufficient amount of data captures general features of the chemical compound structure on which it was trained and that the further reuse of these features on a dataset with a lack of data will greatly improve the quality of the new model. In this paper, we develop this approach for small organic molecules, implementing transfer learning with graph convolutional neural networks. The paper shows a significant improvement in the performance of models for target properties with a lack of data. The effects of the dataset composition on model quality and the applicability domain of the resulting models are also considered.

Related papers

Machine learning in wastewater treatment: insights from modelling a pilot denitrification reactor [0.0]
We use data from a pilot reactor at the Veas treatment facility in Norway to explore how machine learning can be used to optimize biological nitrate. Rather than focusing solely on predictive accuracy, our approach prioritizes understanding the foundational requirements for effective data-driven modelling.
arXiv Detail & Related papers (2024-12-18T16:49:23Z)
Physical Consistency Bridges Heterogeneous Data in Molecular Multi-Task Learning [79.75718786477638]
We exploit the specialty of molecular tasks that there are physical laws connecting them, and design consistency training approaches. We demonstrate that the more accurate energy data can improve the accuracy of structure prediction. We also find that consistency training can directly leverage force and off-equilibrium structure data to improve structure prediction.
arXiv Detail & Related papers (2024-10-14T03:11:33Z)
Bi-level Contrastive Learning for Knowledge-Enhanced Molecule Representations [68.32093648671496]
We introduce GODE, which accounts for the dual-level structure inherent in molecules. Molecules possess an intrinsic graph structure and simultaneously function as nodes within a broader molecular knowledge graph. By pre-training two GNNs on different graph structures, GODE effectively fuses molecular structures with their corresponding knowledge graph substructures.
arXiv Detail & Related papers (2023-06-02T15:49:45Z)
Implicit Geometry and Interaction Embeddings Improve Few-Shot Molecular Property Prediction [53.06671763877109]
We develop molecular embeddings that encode complex molecular characteristics to improve the performance of few-shot molecular property prediction. Our approach leverages large amounts of synthetic data, namely the results of molecular docking calculations. On multiple molecular property prediction benchmarks, training from the embedding space substantially improves Multi-Task, MAML, and Prototypical Network few-shot learning performance.
arXiv Detail & Related papers (2023-02-04T01:32:40Z)
Calibration and generalizability of probabilistic models on low-data chemical datasets with DIONYSUS [0.0]
We perform an extensive study of the calibration and generalizability of probabilistic machine learning models on small chemical datasets. We analyse the quality of their predictions and uncertainties in a variety of tasks (binary, regression) and datasets. We offer practical insights into model and feature choice for modelling small chemical datasets, a common scenario in new chemical experiments.
arXiv Detail & Related papers (2022-12-03T08:19:06Z)
An Adversarial Active Sampling-based Data Augmentation Framework for Manufacturable Chip Design [55.62660894625669]
Lithography modeling is a crucial problem in chip design to ensure a chip design mask is manufacturable. Recent developments in machine learning have provided alternative solutions in replacing the time-consuming lithography simulations with deep neural networks. We propose a litho-aware data augmentation framework to resolve the dilemma of limited data and improve the machine learning model performance.
arXiv Detail & Related papers (2022-10-27T20:53:39Z)
Advancing Reacting Flow Simulations with Data-Driven Models [50.9598607067535]
Key to effective use of machine learning tools in multi-physics problems is to couple them to physical and computer models. The present chapter reviews some of the open opportunities for the application of data-driven reduced-order modeling of combustion systems.
arXiv Detail & Related papers (2022-09-05T16:48:34Z)
A Physics-Guided Neural Operator Learning Approach to Model Biological Tissues from Digital Image Correlation Measurements [3.65211252467094]
We present a data-driven correlation to biological tissue modeling, which aims to predict the displacement field based on digital image correlation (DIC) measurements under unseen loading scenarios. A material database is constructed from the DIC displacement tracking measurements of multiple biaxial stretching protocols on a porcine tricuspid valve leaflet. The material response is modeled as a solution operator from the loading to the resultant displacement field, with the material properties learned implicitly from the data and naturally embedded in the network parameters.
arXiv Detail & Related papers (2022-04-01T04:56:41Z)
Model-agnostic multi-objective approach for the evolutionary discovery of mathematical models [55.41644538483948]
In modern data science, it is more interesting to understand the properties of the model, which parts could be replaced to obtain better results. We use multi-objective evolutionary optimization for composite data-driven model learning to obtain the algorithm's desired properties.
arXiv Detail & Related papers (2021-07-07T11:17:09Z)
Exploring the potential of transfer learning for metamodels of heterogeneous material deformation [0.0]
We show that transfer learning can be used to leverage both low-fidelity simulation data and simulation data. We extend Mechanical MNIST, our open source benchmark dataset of heterogeneous material undergoing large deformation. We show that transferring the knowledge stored in metamodels trained on these low-fidelity simulation results can vastly improve the performance of metamodels used to predict the results of high-fidelity simulations.
arXiv Detail & Related papers (2020-10-28T12:43:46Z)
Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients. We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks. Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z)
BERT Learns (and Teaches) Chemistry [5.653789128055942]
We propose the use of attention to study functional groups and other property-impacting molecular substructures from a data-driven perspective. We then apply the representations of functional groups and atoms learned by the model to tackle problems of toxicity, solubility, drug-likeness, and accessibility.
arXiv Detail & Related papers (2020-07-11T00:23:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.