Molecular Design Based on Integer Programming and Splitting Data Sets by
Hyperplanes
- URL: http://arxiv.org/abs/2305.00801v1
- Date: Thu, 27 Apr 2023 04:18:41 GMT
- Title: Molecular Design Based on Integer Programming and Splitting Data Sets by
Hyperplanes
- Authors: Jianshen Zhu, Naveed Ahmed Azam, Kazuya Haraguchi, Liang Zhao, Hiroshi
Nagamochi and Tatsuya Akutsu
- Abstract summary: We propose a framework for designing the molecular structure of chemical compounds with a desired chemical property.
The proposed framework infers a desired chemical graph by solving a mixed integer linear program (MILP) and a prediction function constructed by a machine learning method.
The results of our computational experiments suggest that the proposed method improved the learning performance for several chemical properties to which a good prediction function has been difficult to construct.
- Score: 6.504869613326338
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A novel framework for designing the molecular structure of chemical compounds
with a desired chemical property has recently been proposed. The framework
infers a desired chemical graph by solving a mixed integer linear program
(MILP) that simulates the computation process of a feature function defined by
a two-layered model on chemical graphs and a prediction function constructed by
a machine learning method. To improve the learning performance of prediction
functions in the framework, we design a method that splits a given data set
$\mathcal{C}$ into two subsets $\mathcal{C}^{(i)},i=1,2$ by a hyperplane in a
chemical space so that most compounds in the first (resp., second) subset have
observed values lower (resp., higher) than a threshold $\theta$. We construct a
prediction function $\psi$ to the data set $\mathcal{C}$ by combining
prediction functions $\psi_i,i=1,2$ each of which is constructed on
$\mathcal{C}^{(i)}$ independently. The results of our computational experiments
suggest that the proposed method improved the learning performance for several
chemical properties to which a good prediction function has been difficult to
construct.
Related papers
- $\nabla^2$DFT: A Universal Quantum Chemistry Dataset of Drug-Like Molecules and a Benchmark for Neural Network Potentials [35.949502493236146]
This work presents a new dataset and benchmark called $nabla2$DFT that is based on the nablaDFT.
It contains twice as much molecular structures, three times more conformations, new data types and tasks, and state-of-the-art models.
$nabla2$DFT is the first dataset that contains relaxation trajectories for a substantial number of drug-like molecules.
arXiv Detail & Related papers (2024-06-20T14:14:59Z) - Projection by Convolution: Optimal Sample Complexity for Reinforcement Learning in Continuous-Space MDPs [56.237917407785545]
We consider the problem of learning an $varepsilon$-optimal policy in a general class of continuous-space Markov decision processes (MDPs) having smooth Bellman operators.
Key to our solution is a novel projection technique based on ideas from harmonic analysis.
Our result bridges the gap between two popular but conflicting perspectives on continuous-space MDPs.
arXiv Detail & Related papers (2024-05-10T09:58:47Z) - Molecular Design Based on Integer Programming and Quadratic Descriptors
in a Two-layered Model [5.845754795753478]
The framework infers a desired chemical graph by solving a mixed integer linear program (MILP)
A set of graph theoretical descriptors in the feature function plays a key role to derive a compact formulation of such an MILP.
The results of our computational experiments suggest that the proposed method can infer a chemical structure with up to 50 non-hydrogen atoms.
arXiv Detail & Related papers (2022-09-13T08:27:25Z) - Quantum Resources Required to Block-Encode a Matrix of Classical Data [56.508135743727934]
We provide circuit-level implementations and resource estimates for several methods of block-encoding a dense $Ntimes N$ matrix of classical data to precision $epsilon$.
We examine resource tradeoffs between the different approaches and explore implementations of two separate models of quantum random access memory (QRAM)
Our results go beyond simple query complexity and provide a clear picture into the resource costs when large amounts of classical data are assumed to be accessible to quantum algorithms.
arXiv Detail & Related papers (2022-06-07T18:00:01Z) - An Inverse QSAR Method Based on Linear Regression and Integer
Programming [6.519339570726759]
We propose a framework for designing the molecular structure of chemical compounds using both artificial neural networks (ANNs) and mixed integer linear programming (MILP)
In this paper, we use linear regression to construct a prediction function $eta$ instead of ANNs.
The results of computational experiments suggest our method can infer chemical graphs with around up to 50 non-hydrogen atoms.
arXiv Detail & Related papers (2021-07-06T04:37:55Z) - Tightening the Dependence on Horizon in the Sample Complexity of
Q-Learning [59.71676469100807]
This work sharpens the sample complexity of synchronous Q-learning to an order of $frac|mathcalS|| (1-gamma)4varepsilon2$ for any $0varepsilon 1$.
Our finding unveils the effectiveness of vanilla Q-learning, which matches that of speedy Q-learning without requiring extra computation and storage.
arXiv Detail & Related papers (2021-02-12T14:22:05Z) - Advanced Graph and Sequence Neural Networks for Molecular Property
Prediction and Drug Discovery [53.00288162642151]
We develop MoleculeKit, a suite of comprehensive machine learning tools spanning different computational models and molecular representations.
Built on these representations, MoleculeKit includes both deep learning and traditional machine learning methods for graph and sequence data.
Results on both online and offline antibiotics discovery and molecular property prediction tasks show that MoleculeKit achieves consistent improvements over prior methods.
arXiv Detail & Related papers (2020-12-02T02:09:31Z) - Learning to extrapolate using continued fractions: Predicting the
critical temperature of superconductor materials [5.905364646955811]
In the field of Artificial Intelligence (AI) and Machine Learning (ML), the approximation of unknown target functions $y=f(mathbfx)$ is a common objective.
We refer to $S$ as the training set and aim to identify a low-complexity mathematical model that can effectively approximate this target function for new instances $mathbfx$.
arXiv Detail & Related papers (2020-11-27T04:57:40Z) - Supervised deep learning prediction of the formation enthalpy of the
full set of configurations in complex phases: the $\sigma-$phase as an
example [1.8369974607582582]
We show how machine learning can be used to predict several properties in solid-state chemistry.
In particular, it can be used to predict the heat of formation of a given complex crystallographic phase.
arXiv Detail & Related papers (2020-11-21T22:07:15Z) - On Function Approximation in Reinforcement Learning: Optimism in the
Face of Large State Spaces [208.67848059021915]
We study the exploration-exploitation tradeoff at the core of reinforcement learning.
In particular, we prove that the complexity of the function class $mathcalF$ characterizes the complexity of the function.
Our regret bounds are independent of the number of episodes.
arXiv Detail & Related papers (2020-11-09T18:32:22Z) - Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and
Variance Reduction [63.41789556777387]
Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a Markov decision process (MDP)
We show that the number of samples needed to yield an entrywise $varepsilon$-accurate estimate of the Q-function is at most on the order of $frac1mu_min (1-gamma)5varepsilon2+ fract_mixmu_min (1-gamma)$ up to some logarithmic factor.
arXiv Detail & Related papers (2020-06-04T17:51:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.