Related papers: Molecular Identification via Molecular Fingerprint extraction from Atomic Force Microscopy images

Molecular Identification via Molecular Fingerprint extraction from Atomic Force Microscopy images

URL: http://arxiv.org/abs/2405.04321v1
Date: Tue, 7 May 2024 13:47:35 GMT
Title: Molecular Identification via Molecular Fingerprint extraction from Atomic Force Microscopy images
Authors: Manuel González Lastre, Pablo Pou, Miguel Wiche, Daniel Ebeling, Andre Schirmeisen, Rubén Pérez,
Abstract summary: Deep learning models can retrieve chemical and structural information encoded in a 3D stack of constant-height HR--AFM images. In this work, we overcome their limitations by using a well-established description of the molecular structure in terms of topological fingerprints. We show that it is possible to complement the fingerprint-based virtual screening with global information provided by another DL model.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Non--Contact Atomic Force Microscopy with CO--functionalized metal tips (referred to as HR-AFM) provides access to the internal structure of individual molecules adsorbed on a surface with totally unprecedented resolution. Previous works have shown that deep learning (DL) models can retrieve the chemical and structural information encoded in a 3D stack of constant-height HR--AFM images, leading to molecular identification. In this work, we overcome their limitations by using a well-established description of the molecular structure in terms of topological fingerprints, the 1024--bit Extended Connectivity Chemical Fingerprints of radius 2 (ECFP4), that were developed for substructure and similarity searching. ECFPs provide local structural information of the molecule, each bit correlating with a particular substructure within the molecule. Our DL model is able to extract this optimized structural descriptor from the 3D HR--AFM stacks and use it, through virtual screening, to identify molecules from their predicted ECFP4 with a retrieval accuracy on theoretical images of 95.4\%. Furthermore, this approach, unlike previous DL models, assigns a confidence score, the Tanimoto similarity, to each of the candidate molecules, thus providing information on the reliability of the identification. By construction, the number of times a certain substructure is present in the molecule is lost during the hashing process, necessary to make them useful for machine learning applications. We show that it is possible to complement the fingerprint-based virtual screening with global information provided by another DL model that predicts from the same HR--AFM stacks the chemical formula, boosting the identification accuracy up to a 97.6\%. Finally, we perform a limited test with experimental images, obtaining promising results towards the application of this pipeline under real conditions

Related papers

Aligned Manifold Property and Topology Point Clouds for Learning Molecular Properties [55.2480439325792]
This work introduces AMPTCR, a molecular surface representation that combines local quantum-derived scalar fields and custom topological descriptors within an aligned point cloud format.<n>For molecular weight, results confirm that AMPTCR encodes physically meaningful data, with a validation R2 of 0.87.<n>In the bacterial inhibition task, AMPTCR enables both classification and direct regression of E. coli inhibition values.
arXiv Detail & Related papers (2025-07-22T04:35:50Z)
Knowledge-aware contrastive heterogeneous molecular graph learning [77.94721384862699]
We propose a paradigm shift by encoding molecular graphs into Heterogeneous Molecular Graph Learning (KCHML) KCHML conceptualizes molecules through three distinct graph views-molecular, elemental, and pharmacological-enhanced by heterogeneous molecular graphs and a dual message-passing mechanism. This design offers a comprehensive representation for property prediction, as well as for downstream tasks such as drug-drug interaction (DDI) prediction.
arXiv Detail & Related papers (2025-02-17T11:53:58Z)
DiffMS: Diffusion Generation of Molecules Conditioned on Mass Spectra [60.39311767532607]
DiffMS is a formula-restricted encoder-decoder generative network. We develop a robust decoder that bridges latent embeddings and molecular structures. Experiments show DiffMS outperforms existing models on $textitde novo$ molecule generation.
arXiv Detail & Related papers (2025-02-13T18:29:48Z)
MoleculeCLA: Rethinking Molecular Benchmark via Computational Ligand-Target Binding Analysis [18.940529282539842]
We construct a large-scale and precise molecular representation dataset of approximately 140,000 small molecules. Our dataset offers significant physicochemical interpretability to guide model development and design. We believe this dataset will serve as a more accurate and reliable benchmark for molecular representation learning.
arXiv Detail & Related papers (2024-06-13T02:50:23Z)
Medication Recommendation via Dual Molecular Modalities and Multi-Step Enhancement [6.927266015351967]
Existing works based on molecular knowledge neglect the 3D geometric structure of molecules and fail to learn the high-dimensional information of medications. We propose a bimodal molecular recommendation framework named BiMoRec, which introduces 3D molecular structures to obtain atomic 3D coordinates and edge indices.
arXiv Detail & Related papers (2024-05-30T07:13:08Z)
UniIF: Unified Molecule Inverse Folding [67.60267592514381]
We propose a unified model UniIF for inverse folding of all molecules. Our proposed method surpasses state-of-the-art methods on all tasks.
arXiv Detail & Related papers (2024-05-29T10:26:16Z)
Data-Efficient Molecular Generation with Hierarchical Textual Inversion [48.816943690420224]
We introduce Hierarchical textual Inversion for Molecular generation (HI-Mol), a novel data-efficient molecular generation method. HI-Mol is inspired by the importance of hierarchical information, e.g., both coarse- and fine-grained features, in understanding the molecule distribution. Compared to the conventional textual inversion method in the image domain using a single-level token embedding, our multi-level token embeddings allow the model to effectively learn the underlying low-shot molecule distribution.
arXiv Detail & Related papers (2024-05-05T08:35:23Z)
SE(3)-Invariant Multiparameter Persistent Homology for Chiral-Sensitive Molecular Property Prediction [1.534667887016089]
We present a novel method for generating molecular fingerprints using multi parameter persistent homology (MPPH) This technique holds considerable significance for drug discovery and materials science, where precise molecular property prediction is vital. We demonstrate its superior performance over existing state-of-the-art methods in predicting molecular properties through extensive evaluations on the MoleculeNet benchmark.
arXiv Detail & Related papers (2023-12-12T09:33:54Z)
MultiModal-Learning for Predicting Molecular Properties: A Framework Based on Image and Graph Structures [2.5563339057415218]
MolIG is a novel MultiModaL molecular pre-training framework for predicting molecular properties based on Image and Graph structures. It amalgamates the strengths of both molecular representation forms. It exhibits enhanced performance in downstream tasks pertaining to molecular property prediction within benchmark groups.
arXiv Detail & Related papers (2023-11-28T10:28:35Z)
MolFM: A Multimodal Molecular Foundation Model [9.934141536012596]
MolFM is a multimodal molecular foundation model designed to facilitate joint representation learning from molecular structures, biomedical texts, and knowledge graphs. We provide theoretical analysis that our cross-modal pre-training captures local and global molecular knowledge by minimizing the distance in the feature space between different modalities of the same molecule. On cross-modal retrieval, MolFM outperforms existing models with 12.13% and 5.04% absolute gains under the zero-shot and fine-tuning settings, respectively.
arXiv Detail & Related papers (2023-06-06T12:45:15Z)
Bi-level Contrastive Learning for Knowledge-Enhanced Molecule Representations [55.42602325017405]
We propose a novel method called GODE, which takes into account the two-level structure of individual molecules. By pre-training two graph neural networks (GNNs) on different graph structures, combined with contrastive learning, GODE fuses molecular structures with their corresponding knowledge graph substructures. When fine-tuned across 11 chemical property tasks, our model outperforms existing benchmarks, registering an average ROC-AUC uplift of 13.8% for classification tasks and an average RMSE/MAE enhancement of 35.1% for regression tasks.
arXiv Detail & Related papers (2023-06-02T15:49:45Z)
MUDiff: Unified Diffusion for Complete Molecule Generation [104.7021929437504]
We present a new model for generating a comprehensive representation of molecules, including atom features, 2D discrete molecule structures, and 3D continuous molecule coordinates. We propose a novel graph transformer architecture to denoise the diffusion process. Our model is a promising approach for designing stable and diverse molecules and can be applied to a wide range of tasks in molecular modeling.
arXiv Detail & Related papers (2023-04-28T04:25:57Z)
An Equivariant Generative Framework for Molecular Graph-Structure Co-Design [54.92529253182004]
We present MolCode, a machine learning-based generative framework for underlineMolecular graph-structure underlineCo-design. In MolCode, 3D geometric information empowers the molecular 2D graph generation, which in turn helps guide the prediction of molecular 3D structure. Our investigation reveals that the 2D topology and 3D geometry contain intrinsically complementary information in molecule design.
arXiv Detail & Related papers (2023-04-12T13:34:22Z)
Implicit Geometry and Interaction Embeddings Improve Few-Shot Molecular Property Prediction [53.06671763877109]
We develop molecular embeddings that encode complex molecular characteristics to improve the performance of few-shot molecular property prediction. Our approach leverages large amounts of synthetic data, namely the results of molecular docking calculations. On multiple molecular property prediction benchmarks, training from the embedding space substantially improves Multi-Task, MAML, and Prototypical Network few-shot learning performance.
arXiv Detail & Related papers (2023-02-04T01:32:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.