Understanding the Structure of QM7b and QM9 Quantum Mechanical Datasets
Using Unsupervised Learning
- URL: http://arxiv.org/abs/2309.15130v1
- Date: Mon, 25 Sep 2023 23:06:32 GMT
- Title: Understanding the Structure of QM7b and QM9 Quantum Mechanical Datasets
Using Unsupervised Learning
- Authors: Julio J. Vald\'es and Alain B. Tchagang
- Abstract summary: Intrinsic analysis, clustering, and outlier detection methods were used in the study.
The QM7b data is composed of well defined clusters related to atomic composition.
The QM9 data consists of an outer region predominantly composed of outliers, and an inner core region that concentrates clustered, inliner objects.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper explores the internal structure of two quantum mechanics datasets
(QM7b, QM9), composed of several thousands of organic molecules and described
in terms of electronic properties. Understanding the structure and
characteristics of this kind of data is important when predicting the atomic
composition from the properties in inverse molecular designs. Intrinsic
dimension analysis, clustering, and outlier detection methods were used in the
study. They revealed that for both datasets the intrinsic dimensionality is
several times smaller than the descriptive dimensions. The QM7b data is
composed of well defined clusters related to atomic composition. The QM9 data
consists of an outer region predominantly composed of outliers, and an inner
core region that concentrates clustered, inliner objects. A significant
relationship exists between the number of atoms in the molecule and its
outlier/inner nature. Despite the structural differences, the predictability of
variables of interest for inverse molecular design is high. This is exemplified
with models estimating the number of atoms of the molecule from both the
original properties, and from lower dimensional embedding spaces.
Related papers
- Molecular topological deep learning for polymer property prediction [18.602659324026934]
We develop molecular topological deep learning (Mol-TDL) for polymer property analysis.
Mol-TDL incorporates both high-order interactions and multiscale properties into topological deep learning architecture.
arXiv Detail & Related papers (2024-10-07T05:44:02Z) - QH9: A Quantum Hamiltonian Prediction Benchmark for QM9 Molecules [69.25826391912368]
We generate a new Quantum Hamiltonian dataset, named as QH9, to provide precise Hamiltonian matrices for 999 or 2998 molecular dynamics trajectories.
We show that current machine learning models have the capacity to predict Hamiltonian matrices for arbitrary molecules.
arXiv Detail & Related papers (2023-06-15T23:39:07Z) - Atomic and Subgraph-aware Bilateral Aggregation for Molecular
Representation Learning [57.670845619155195]
We introduce a new model for molecular representation learning called the Atomic and Subgraph-aware Bilateral Aggregation (ASBA)
ASBA addresses the limitations of previous atom-wise and subgraph-wise models by incorporating both types of information.
Our method offers a more comprehensive way to learn representations for molecular property prediction and has broad potential in drug and material discovery applications.
arXiv Detail & Related papers (2023-05-22T00:56:00Z) - Implicit Geometry and Interaction Embeddings Improve Few-Shot Molecular
Property Prediction [53.06671763877109]
We develop molecular embeddings that encode complex molecular characteristics to improve the performance of few-shot molecular property prediction.
Our approach leverages large amounts of synthetic data, namely the results of molecular docking calculations.
On multiple molecular property prediction benchmarks, training from the embedding space substantially improves Multi-Task, MAML, and Prototypical Network few-shot learning performance.
arXiv Detail & Related papers (2023-02-04T01:32:40Z) - Molecular Geometry-aware Transformer for accurate 3D Atomic System
modeling [51.83761266429285]
We propose a novel Transformer architecture that takes nodes (atoms) and edges (bonds and nonbonding atom pairs) as inputs and models the interactions among them.
Moleformer achieves state-of-the-art on the initial state to relaxed energy prediction of OC20 and is very competitive in QM9 on predicting quantum chemical properties.
arXiv Detail & Related papers (2023-02-02T03:49:57Z) - GEM-2: Next Generation Molecular Property Prediction Network with
Many-body and Full-range Interaction Modeling [24.94616336296936]
GEM-2 is a novel method for solving the Schr"odinger equation for molecules.
It considers both the long-range and many-body interactions in molecules.
arXiv Detail & Related papers (2022-08-11T15:12:25Z) - Equivariant representations for molecular Hamiltonians and N-center
atomic-scale properties [0.0]
We discuss a family of structural descriptors that generalize the very successful atom-centered density correlation features to the N-centers case.
We show in particular how this construction can be applied to efficiently learn the matrix elements of the (effective) single-particle Hamiltonian written in an atom-centered orbital basis.
arXiv Detail & Related papers (2021-09-24T17:19:57Z) - Flexible dual-branched message passing neural network for quantum
mechanical property prediction with molecular conformation [16.08677447593939]
We propose a dual-branched neural network for molecular property prediction based on message-passing framework.
Our model learns heterogeneous molecular features with different scales, which are trained flexibly according to each prediction target.
arXiv Detail & Related papers (2021-06-14T10:00:39Z) - Knowledge-aware Contrastive Molecular Graph Learning [5.08771973600915]
We propose Contrastive Knowledge-aware GNN (CKGNN) for self-supervised molecular representation learning.
We explicitly encode domain knowledge via knowledge-aware molecular encoder under the contrastive learning framework.
Experiments on 8 public datasets demonstrate the effectiveness of our model with a 6% absolute improvement on average.
arXiv Detail & Related papers (2021-03-24T08:55:08Z) - The role of feature space in atomistic learning [62.997667081978825]
Physically-inspired descriptors play a key role in the application of machine-learning techniques to atomistic simulations.
We introduce a framework to compare different sets of descriptors, and different ways of transforming them by means of metrics and kernels.
We compare representations built in terms of n-body correlations of the atom density, quantitatively assessing the information loss associated with the use of low-order features.
arXiv Detail & Related papers (2020-09-06T14:12:09Z) - Graph Neural Network for Hamiltonian-Based Material Property Prediction [56.94118357003096]
We present and compare several different graph convolution networks that are able to predict the band gap for inorganic materials.
The models are developed to incorporate two different features: the information of each orbital itself and the interaction between each other.
The results show that our model can get a promising prediction accuracy with cross-validation.
arXiv Detail & Related papers (2020-05-27T13:32:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.