Beyond Chemical Language: A Multimodal Approach to Enhance Molecular
Property Prediction
- URL: http://arxiv.org/abs/2306.14919v1
- Date: Thu, 22 Jun 2023 13:28:59 GMT
- Title: Beyond Chemical Language: A Multimodal Approach to Enhance Molecular
Property Prediction
- Authors: Eduardo Soares, Emilio Vital Brazil, Karen Fiorela Aquino Gutierrez,
Renato Cerqueira, Dan Sanders, Kristin Schmidt, Dmitry Zubarev
- Abstract summary: We present a novel multimodal language model approach for predicting molecular properties by combining chemical language representation with physicochemical features.
Our approach, MULTIMODAL-MOLFORMER, utilizes a causal multistage feature selection method that identifies physicochemical features based on their direct causal effect on a specific target property.
Our results demonstrate a superior performance compared to existing state-of-the-art algorithms, including the chemical language-based MOLFORMER and graph neural networks.
- Score: 2.1202329976106924
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a novel multimodal language model approach for predicting
molecular properties by combining chemical language representation with
physicochemical features. Our approach, MULTIMODAL-MOLFORMER, utilizes a causal
multistage feature selection method that identifies physicochemical features
based on their direct causal effect on a specific target property. These causal
features are then integrated with the vector space generated by molecular
embeddings from MOLFORMER. In particular, we employ Mordred descriptors as
physicochemical features and identify the Markov blanket of the target
property, which theoretically contains the most relevant features for accurate
prediction. Our results demonstrate a superior performance of our proposed
approach compared to existing state-of-the-art algorithms, including the
chemical language-based MOLFORMER and graph neural networks, in predicting
complex tasks such as biodegradability and PFAS toxicity estimation. Moreover,
we demonstrate the effectiveness of our feature selection method in reducing
the dimensionality of the Mordred feature space while maintaining or improving
the model's performance. Our approach opens up promising avenues for future
research in molecular property prediction by harnessing the synergistic
potential of both chemical language and physicochemical features, leading to
enhanced performance and advancements in the field.
Related papers
- Pre-trained Molecular Language Models with Random Functional Group Masking [54.900360309677794]
We propose a SMILES-based underlineem Molecular underlineem Language underlineem Model, which randomly masking SMILES subsequences corresponding to specific molecular atoms.
This technique aims to compel the model to better infer molecular structures and properties, thus enhancing its predictive capabilities.
arXiv Detail & Related papers (2024-11-03T01:56:15Z) - FARM: Functional Group-Aware Representations for Small Molecules [55.281754551202326]
We introduce Functional Group-Aware Representations for Small Molecules (FARM)
FARM is a foundation model designed to bridge the gap between SMILES, natural language, and molecular graphs.
We rigorously evaluate FARM on the MoleculeNet dataset, where it achieves state-of-the-art performance on 10 out of 12 tasks.
arXiv Detail & Related papers (2024-10-02T23:04:58Z) - Cross-Modal Learning for Chemistry Property Prediction: Large Language Models Meet Graph Machine Learning [0.0]
We introduce a Multi-Modal Fusion (MMF) framework that harnesses the analytical prowess of Graph Neural Networks (GNNs) and the linguistic generative and predictive abilities of Large Language Models (LLMs)
Our framework combines the effectiveness of GNNs in modeling graph-structured data with the zero-shot and few-shot learning capabilities of LLMs, enabling improved predictions while reducing the risk of overfitting.
arXiv Detail & Related papers (2024-08-27T11:10:39Z) - A Gaussian Process Model for Ordinal Data with Applications to Chemoinformatics [0.0]
We present conditional Gaussian process models to predict ordinal outcomes from chemical experiments.
A novel aspect of our model is that the kernel contains a scaling parameter, that controls the strength of the correlation between elements of the chemical space.
Using molecular fingerprints, a numerical representation of a compound's location within the chemical space, we show that accounting for correlation amongst chemical compounds improves predictive performance.
arXiv Detail & Related papers (2024-05-16T11:18:32Z) - Active Causal Learning for Decoding Chemical Complexities with Targeted Interventions [0.0]
We introduce an active learning approach that discerns underlying cause-effect relationships through strategic sampling.
This method identifies the smallest subset of the dataset capable of encoding the most information representative of a much larger chemical space.
The identified causal relations are then leveraged to conduct systematic interventions, optimizing the design task within a chemical space that the models have not encountered previously.
arXiv Detail & Related papers (2024-04-05T17:15:48Z) - Improving Molecular Properties Prediction Through Latent Space Fusion [9.912768918657354]
We present a multi-view approach that combines latent spaces derived from state-of-the-art chemical models.
Our approach relies on two pivotal elements: the embeddings derived from MHG-GNN, which represent molecular structures as graphs, and MoLFormer embeddings rooted in chemical language.
We demonstrate the superior performance of our proposed multi-view approach compared to existing state-of-the-art methods.
arXiv Detail & Related papers (2023-10-20T20:29:32Z) - Molecule Design by Latent Space Energy-Based Modeling and Gradual
Distribution Shifting [53.44684898432997]
Generation of molecules with desired chemical and biological properties is critical for drug discovery.
We propose a probabilistic generative model to capture the joint distribution of molecules and their properties.
Our method achieves very strong performances on various molecule design tasks.
arXiv Detail & Related papers (2023-06-09T03:04:21Z) - Implicit Geometry and Interaction Embeddings Improve Few-Shot Molecular
Property Prediction [53.06671763877109]
We develop molecular embeddings that encode complex molecular characteristics to improve the performance of few-shot molecular property prediction.
Our approach leverages large amounts of synthetic data, namely the results of molecular docking calculations.
On multiple molecular property prediction benchmarks, training from the embedding space substantially improves Multi-Task, MAML, and Prototypical Network few-shot learning performance.
arXiv Detail & Related papers (2023-02-04T01:32:40Z) - Flexible dual-branched message passing neural network for quantum
mechanical property prediction with molecular conformation [16.08677447593939]
We propose a dual-branched neural network for molecular property prediction based on message-passing framework.
Our model learns heterogeneous molecular features with different scales, which are trained flexibly according to each prediction target.
arXiv Detail & Related papers (2021-06-14T10:00:39Z) - Reinforced Molecular Optimization with Neighborhood-Controlled Grammars [63.84003497770347]
We propose MNCE-RL, a graph convolutional policy network for molecular optimization.
We extend the original neighborhood-controlled embedding grammars to make them applicable to molecular graph generation.
We show that our approach achieves state-of-the-art performance in a diverse range of molecular optimization tasks.
arXiv Detail & Related papers (2020-11-14T05:42:15Z) - Optimizing Molecules using Efficient Queries from Property Evaluations [66.66290256377376]
We propose QMO, a generic query-based molecule optimization framework.
QMO improves the desired properties of an input molecule based on efficient queries.
We show that QMO outperforms existing methods in the benchmark tasks of optimizing small organic molecules.
arXiv Detail & Related papers (2020-11-03T18:51:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.