Knowledge Discovery from Atomic Structures using Feature Importances
- URL: http://arxiv.org/abs/2303.09453v1
- Date: Tue, 28 Feb 2023 11:54:10 GMT
- Title: Knowledge Discovery from Atomic Structures using Feature Importances
- Authors: Joakim Linja and Joonas H\"am\"al\"ainen and Antti Pihlajam\"aki and
Paavo Nieminen and Sami Malola and Hannu H\"akkinen and Tommi K\"arkk\"ainen
- Abstract summary: Molecular-level understanding of the interactions between the constituents of an atomic structure is essential for designing novel materials.
This need goes beyond the basic knowledge of the number and types of atoms, their chemical composition, and the character of the chemical interactions.
An alternative way to address atomic interactions is to use an interpretable machine learning approach, where a predictive DFT surrogate is constructed and analyzed.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Molecular-level understanding of the interactions between the constituents of
an atomic structure is essential for designing novel materials in various
applications. This need goes beyond the basic knowledge of the number and types
of atoms, their chemical composition, and the character of the chemical
interactions. The bigger picture takes place on the quantum level which can be
addressed by using the Density-functional theory (DFT). Use of DFT, however, is
a computationally taxing process, and its results do not readily provide easily
interpretable insight into the atomic interactions which would be useful
information in material design. An alternative way to address atomic
interactions is to use an interpretable machine learning approach, where a
predictive DFT surrogate is constructed and analyzed. The purpose of this paper
is to propose such a procedure using a modification of the recently published
interpretable distance-based regression method. Our tests with a representative
benchmark set of molecules and a complex hybrid nanoparticle confirm the
viability and usefulness of the proposed approach.
Related papers
- MBD-ML: Many-body dispersion from machine learning for molecules and materials [39.27725073249277]
Van der Waals (vdW) interactions are essential for describing molecules and materials, from drug design to battery applications.<n>The many-body dispersion (MBD) method stands out as one of the most accurate and transferable approaches to capture vdW interactions.<n>We present MBD-ML, a pretrained message passing neural network predicts these atomic properties directly from atomic structures.
arXiv Detail & Related papers (2026-02-25T16:34:53Z) - AtomDisc: An Atom-level Tokenizer that Boosts Molecular LLMs and Reveals Structure--Property Associations [11.856011146903889]
We introduce AtomDisc, a framework that quantizes atom-level local environments into structure-aware tokens embedded in large language models.<n>Our experiments show that AtomDisc, in a data-driven way, can distinguish chemically meaningful structural features that reveal structure-property associations.
arXiv Detail & Related papers (2025-11-28T02:42:17Z) - Foundation Models for Discovery and Exploration in Chemical Space [57.97784111110166]
MIST is a family of molecular foundation models trained on large unlabeled datasets.<n>We demonstrate the ability of these models to solve real-world problems across chemical space.
arXiv Detail & Related papers (2025-10-20T17:56:01Z) - Tokenizing Electron Cloud in Protein-Ligand Interaction Learning [51.74909649330779]
ECBind is a method for tokenizing electron cloud signals into quantized embeddings.<n>It helps uncover binding modes that cannot be fully represented by atom-level models.<n>To extend its applicability to a wider range of scenarios, we utilize knowledge distillation to develop an electron-cloud-agnostic prediction model.
arXiv Detail & Related papers (2025-05-25T07:36:50Z) - Path-integral molecular dynamics with actively-trained and universal machine learning force fields [0.0]
Accounting for nuclear quantum effects (NQEs) can significantly alter material properties at finite temperatures.<n>Machine-learned interatomic potentials offer a solution to this challenge.<n> interface was developed to integrate moment tensor potentials (MTPs) from the MLIP-2 software package into PIMD calculations.<n>Results were compared with experimental data, quasi-harmonic approximation calculations, and predictions from the universal machine learning force field MatterSim.
arXiv Detail & Related papers (2025-05-20T11:55:22Z) - Can Atomic Step Decomposition Enhance the Self-structured Reasoning of Multimodal Large Models? [68.72260770171212]
We propose a paradigm of Self-structured Chain of Thought (SCoT), which is composed of minimal semantic atomic steps.
Our method can not only generate cognitive CoT structures for various complex tasks but also mitigates the phenomenon of overthinking.
We conduct extensive experiments to show that the proposed AtomThink significantly improves the performance of baseline MLLMs.
arXiv Detail & Related papers (2025-03-08T15:23:47Z) - A General Neural Network Potential for Energetic Materials with C, H, N, and O elements [0.9742644628669695]
High-energy materials (HEMs) are constrained by the prohibitive computational expense and prolonged development cycles.
We develop a general neural network potential (NNP) that efficiently predicts the structural, mechanical, and decomposition properties of HEMs.
arXiv Detail & Related papers (2025-03-03T03:24:59Z) - AtomThink: Multimodal Slow Thinking with Atomic Step Reasoning [68.65389926175506]
We propose a novel paradigm of Self-structured Chain of Thought (SCoT)<n>Our method can not only generate cognitive CoT structures for various complex tasks but also mitigates the phenomena of overthinking for easier tasks.<n>We conduct extensive experiments to show that the proposed AtomThink significantly improves the performance of baseline MLLMs.
arXiv Detail & Related papers (2024-11-18T11:54:58Z) - Multi-task learning for molecular electronic structure approaching coupled-cluster accuracy [9.81014501502049]
We develop a unified machine learning method for electronic structures of organic molecules using the gold-standard CCSD(T) calculations as training data.
Tested on hydrocarbon molecules, our model outperforms DFT with the widely-used hybrid and double hybrid functionals in computational costs and prediction accuracy of various quantum chemical properties.
arXiv Detail & Related papers (2024-05-09T19:51:27Z) - Atomic and Subgraph-aware Bilateral Aggregation for Molecular
Representation Learning [57.670845619155195]
We introduce a new model for molecular representation learning called the Atomic and Subgraph-aware Bilateral Aggregation (ASBA)
ASBA addresses the limitations of previous atom-wise and subgraph-wise models by incorporating both types of information.
Our method offers a more comprehensive way to learn representations for molecular property prediction and has broad potential in drug and material discovery applications.
arXiv Detail & Related papers (2023-05-22T00:56:00Z) - Molecular Geometry-aware Transformer for accurate 3D Atomic System
modeling [51.83761266429285]
We propose a novel Transformer architecture that takes nodes (atoms) and edges (bonds and nonbonding atom pairs) as inputs and models the interactions among them.
Moleformer achieves state-of-the-art on the initial state to relaxed energy prediction of OC20 and is very competitive in QM9 on predicting quantum chemical properties.
arXiv Detail & Related papers (2023-02-02T03:49:57Z) - Discovery of structure-property relations for molecules via
hypothesis-driven active learning over the chemical space [0.0]
We introduce a novel approach for the active learning over the chemical spaces based on hypothesis learning.
We construct the hypotheses on the possible relationships between structures and functionalities of interest based on a small subset of data.
This approach combines the elements from the symbolic regression methods such as SISSO and active learning into a single framework.
arXiv Detail & Related papers (2023-01-06T14:22:43Z) - Accurate Machine Learned Quantum-Mechanical Force Fields for
Biomolecular Simulations [51.68332623405432]
Molecular dynamics (MD) simulations allow atomistic insights into chemical and biological processes.
Recently, machine learned force fields (MLFFs) emerged as an alternative means to execute MD simulations.
This work proposes a general approach to constructing accurate MLFFs for large-scale molecular simulations.
arXiv Detail & Related papers (2022-05-17T13:08:28Z) - Geometric Transformer for End-to-End Molecule Properties Prediction [92.28929858529679]
We introduce a Transformer-based architecture for molecule property prediction, which is able to capture the geometry of the molecule.
We modify the classical positional encoder by an initial encoding of the molecule geometry, as well as a learned gated self-attention mechanism.
arXiv Detail & Related papers (2021-10-26T14:14:40Z) - BIGDML: Towards Exact Machine Learning Force Fields for Materials [55.944221055171276]
Machine-learning force fields (MLFF) should be accurate, computationally and data efficient, and applicable to molecules, materials, and interfaces thereof.
Here, we introduce the Bravais-Inspired Gradient-Domain Machine Learning approach and demonstrate its ability to construct reliable force fields using a training set with just 10-200 atoms.
arXiv Detail & Related papers (2021-06-08T10:14:57Z) - Learning the exchange-correlation functional from nature with fully
differentiable density functional theory [0.0]
We train a neural network to replace the exchange-correlation functional within a fully-differentiable three-dimensional Kohn-Sham density functional theory framework.
Our trained exchange-correlation network provided improved prediction of atomization and ionization energies across a collection of 110 molecules.
arXiv Detail & Related papers (2021-02-08T14:25:10Z) - A Universal Framework for Featurization of Atomistic Systems [0.0]
Reactive force fields based on physics or machine learning can be used to bridge the gap in time and length scales.
We introduce the Gaussian multi-pole (GMP) featurization scheme that utilizes physically-relevant multi-pole expansions of the electron density around atoms.
We demonstrate that GMP-based models can achieve chemical accuracy for the QM9 dataset, and their accuracy remains reasonable even when extrapolating to new elements.
arXiv Detail & Related papers (2021-02-04T03:11:00Z) - Graph Neural Network for Hamiltonian-Based Material Property Prediction [56.94118357003096]
We present and compare several different graph convolution networks that are able to predict the band gap for inorganic materials.
The models are developed to incorporate two different features: the information of each orbital itself and the interaction between each other.
The results show that our model can get a promising prediction accuracy with cross-validation.
arXiv Detail & Related papers (2020-05-27T13:32:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.