Related papers: Beyond Chemical Language: A Multimodal Approach to Enhance Molecular Property Prediction

Beyond Chemical Language: A Multimodal Approach to Enhance Molecular Property Prediction

URL: http://arxiv.org/abs/2306.14919v1
Date: Thu, 22 Jun 2023 13:28:59 GMT
Title: Beyond Chemical Language: A Multimodal Approach to Enhance Molecular Property Prediction
Authors: Eduardo Soares, Emilio Vital Brazil, Karen Fiorela Aquino Gutierrez, Renato Cerqueira, Dan Sanders, Kristin Schmidt, Dmitry Zubarev
Abstract summary: We present a novel multimodal language model approach for predicting molecular properties by combining chemical language representation with physicochemical features. Our approach, MULTIMODAL-MOLFORMER, utilizes a causal multistage feature selection method that identifies physicochemical features based on their direct causal effect on a specific target property. Our results demonstrate a superior performance compared to existing state-of-the-art algorithms, including the chemical language-based MOLFORMER and graph neural networks.
Score: 2.1202329976106924
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present a novel multimodal language model approach for predicting molecular properties by combining chemical language representation with physicochemical features. Our approach, MULTIMODAL-MOLFORMER, utilizes a causal multistage feature selection method that identifies physicochemical features based on their direct causal effect on a specific target property. These causal features are then integrated with the vector space generated by molecular embeddings from MOLFORMER. In particular, we employ Mordred descriptors as physicochemical features and identify the Markov blanket of the target property, which theoretically contains the most relevant features for accurate prediction. Our results demonstrate a superior performance of our proposed approach compared to existing state-of-the-art algorithms, including the chemical language-based MOLFORMER and graph neural networks, in predicting complex tasks such as biodegradability and PFAS toxicity estimation. Moreover, we demonstrate the effectiveness of our feature selection method in reducing the dimensionality of the Mordred feature space while maintaining or improving the model's performance. Our approach opens up promising avenues for future research in molecular property prediction by harnessing the synergistic potential of both chemical language and physicochemical features, leading to enhanced performance and advancements in the field.

Related papers

Learning Hierarchical Interaction for Accurate Molecular Property Prediction [8.488251667425887]
We propose a Hierarchical Interaction Message Passing Mechanism, which serves as the foundation of our novel model, HimNet. Our method enables interaction-aware representation learning across atomic, motif, and molecular levels via hierarchical attention-guided message passing. Our method exhibits promising hierarchical interpretability, aligning well with chemical intuition on representative molecules.
arXiv Detail & Related papers (2025-04-28T15:19:28Z)
Pre-trained Molecular Language Models with Random Functional Group Masking [54.900360309677794]
We propose a SMILES-based underlineem Molecular underlineem Language underlineem Model, which randomly masking SMILES subsequences corresponding to specific molecular atoms. This technique aims to compel the model to better infer molecular structures and properties, thus enhancing its predictive capabilities.
arXiv Detail & Related papers (2024-11-03T01:56:15Z)
FARM: Functional Group-Aware Representations for Small Molecules [55.281754551202326]
We introduce Functional Group-Aware Representations for Small Molecules (FARM) FARM is a foundation model designed to bridge the gap between SMILES, natural language, and molecular graphs. We rigorously evaluate FARM on the MoleculeNet dataset, where it achieves state-of-the-art performance on 10 out of 12 tasks.
arXiv Detail & Related papers (2024-10-02T23:04:58Z)
Cross-Modal Learning for Chemistry Property Prediction: Large Language Models Meet Graph Machine Learning [0.0]
We introduce a Multi-Modal Fusion (MMF) framework that harnesses the analytical prowess of Graph Neural Networks (GNNs) and the linguistic generative and predictive abilities of Large Language Models (LLMs) Our framework combines the effectiveness of GNNs in modeling graph-structured data with the zero-shot and few-shot learning capabilities of LLMs, enabling improved predictions while reducing the risk of overfitting.
arXiv Detail & Related papers (2024-08-27T11:10:39Z)
Aligning Target-Aware Molecule Diffusion Models with Exact Energy Optimization [147.7899503829411]
AliDiff is a novel framework to align pretrained target diffusion models with preferred functional properties. It can generate molecules with state-of-the-art binding energies with up to -7.07 Avg. Vina Score.
arXiv Detail & Related papers (2024-07-01T06:10:29Z)
A Gaussian Process Model for Ordinal Data with Applications to Chemoinformatics [0.0]
We present conditional Gaussian process models to predict ordinal outcomes from chemical experiments. A novel aspect of our model is that the kernel contains a scaling parameter, that controls the strength of the correlation between elements of the chemical space. We present a genetic algorithm for the facilitation of chemical discovery and identification of important features to the compound's efficacy.
arXiv Detail & Related papers (2024-05-16T11:18:32Z)
Active Causal Learning for Decoding Chemical Complexities with Targeted Interventions [0.0]
We introduce an active learning approach that discerns underlying cause-effect relationships through strategic sampling. This method identifies the smallest subset of the dataset capable of encoding the most information representative of a much larger chemical space. The identified causal relations are then leveraged to conduct systematic interventions, optimizing the design task within a chemical space that the models have not encountered previously.
arXiv Detail & Related papers (2024-04-05T17:15:48Z)
Improving Molecular Properties Prediction Through Latent Space Fusion [9.912768918657354]
We present a multi-view approach that combines latent spaces derived from state-of-the-art chemical models. Our approach relies on two pivotal elements: the embeddings derived from MHG-GNN, which represent molecular structures as graphs, and MoLFormer embeddings rooted in chemical language. We demonstrate the superior performance of our proposed multi-view approach compared to existing state-of-the-art methods.
arXiv Detail & Related papers (2023-10-20T20:29:32Z)
Molecule Design by Latent Space Energy-Based Modeling and Gradual Distribution Shifting [53.44684898432997]
Generation of molecules with desired chemical and biological properties is critical for drug discovery. We propose a probabilistic generative model to capture the joint distribution of molecules and their properties. Our method achieves very strong performances on various molecule design tasks.
arXiv Detail & Related papers (2023-06-09T03:04:21Z)
Implicit Geometry and Interaction Embeddings Improve Few-Shot Molecular Property Prediction [53.06671763877109]
We develop molecular embeddings that encode complex molecular characteristics to improve the performance of few-shot molecular property prediction. Our approach leverages large amounts of synthetic data, namely the results of molecular docking calculations. On multiple molecular property prediction benchmarks, training from the embedding space substantially improves Multi-Task, MAML, and Prototypical Network few-shot learning performance.
arXiv Detail & Related papers (2023-02-04T01:32:40Z)
Flexible dual-branched message passing neural network for quantum mechanical property prediction with molecular conformation [16.08677447593939]
We propose a dual-branched neural network for molecular property prediction based on message-passing framework. Our model learns heterogeneous molecular features with different scales, which are trained flexibly according to each prediction target.
arXiv Detail & Related papers (2021-06-14T10:00:39Z)
Reinforced Molecular Optimization with Neighborhood-Controlled Grammars [63.84003497770347]
We propose MNCE-RL, a graph convolutional policy network for molecular optimization. We extend the original neighborhood-controlled embedding grammars to make them applicable to molecular graph generation. We show that our approach achieves state-of-the-art performance in a diverse range of molecular optimization tasks.
arXiv Detail & Related papers (2020-11-14T05:42:15Z)
Optimizing Molecules using Efficient Queries from Property Evaluations [66.66290256377376]
We propose QMO, a generic query-based molecule optimization framework. QMO improves the desired properties of an input molecule based on efficient queries. We show that QMO outperforms existing methods in the benchmark tasks of optimizing small organic molecules.
arXiv Detail & Related papers (2020-11-03T18:51:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.