Related papers: MolMetaLM: a Physicochemical Knowledge-Guided Molecular Meta Language Model

MolMetaLM: a Physicochemical Knowledge-Guided Molecular Meta Language Model

URL: http://arxiv.org/abs/2411.15500v1
Date: Sat, 23 Nov 2024 09:27:38 GMT
Title: MolMetaLM: a Physicochemical Knowledge-Guided Molecular Meta Language Model
Authors: Yifan Wu, Min Zeng, Yang Li, Yang Zhang, Min Li,
Abstract summary: We propose a novel physicochemical knowledge-guided molecular meta language framework MolMetaLM. We design a molecule-specialized meta language paradigm, formatted as multiple S,P,O> knowledge triples sharing the same S (i.e., molecule) By introducing different molecular knowledge and noises, the meta language paradigm generates tens of thousands of pretraining tasks.
Score: 19.458584012046646
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Most current molecular language models transfer the masked language model or image-text generation model from natural language processing to molecular field. However, molecules are not solely characterized by atom/bond symbols; they encapsulate important physical/chemical properties. Moreover, normal language models bring grammar rules that are irrelevant for understanding molecules. In this study, we propose a novel physicochemical knowledge-guided molecular meta language framework MolMetaLM. We design a molecule-specialized meta language paradigm, formatted as multiple <S,P,O> (subject, predicate, object) knowledge triples sharing the same S (i.e., molecule) to enhance learning the semantic relationships between physicochemical knowledge and molecules. By introducing different molecular knowledge and noises, the meta language paradigm generates tens of thousands of pretraining tasks. By recovering the token/sequence/order-level noises, MolMetaLM exhibits proficiency in large-scale benchmark evaluations involving property prediction, molecule generation, conformation inference, and molecular optimization. Through MolMetaLM, we offer a new insight for designing language models.

Related papers

Mol-LLaMA: Towards General Understanding of Molecules in Large Molecular Language Model [55.87790704067848]
Mol-LLaMA is a large molecular language model that grasps the general knowledge centered on molecules. We introduce a module that integrates complementary information from different molecular encoders. Our experimental results demonstrate that Mol-LLaMA is capable of comprehending the general features of molecules.
arXiv Detail & Related papers (2025-02-19T05:49:10Z)
Knowledge-aware contrastive heterogeneous molecular graph learning [77.94721384862699]
We propose a paradigm shift by encoding molecular graphs into Heterogeneous Molecular Graph Learning (KCHML) KCHML conceptualizes molecules through three distinct graph views-molecular, elemental, and pharmacological-enhanced by heterogeneous molecular graphs and a dual message-passing mechanism. This design offers a comprehensive representation for property prediction, as well as for downstream tasks such as drug-drug interaction (DDI) prediction.
arXiv Detail & Related papers (2025-02-17T11:53:58Z)
Chemical Language Model Linker: blending text and molecules with modular adapters [2.2667044928324747]
We propose a lightweight adapter-based strategy named Chemical Language Model Linker (ChemLML) ChemLML blends the two single domain models and obtains conditional molecular generation from text descriptions. We find that the choice of molecular representation used within ChemLML, SMILES versus SELFIES, has a strong influence on conditional molecular generation performance.
arXiv Detail & Related papers (2024-10-26T13:40:13Z)
MolTRES: Improving Chemical Language Representation Learning for Molecular Property Prediction [14.353313239109337]
MolTRES is a novel chemical language representation learning framework. It incorporates generator-discriminator training, allowing the model to learn from more challenging examples. Our model outperforms existing state-of-the-art models on popular molecular property prediction tasks.
arXiv Detail & Related papers (2024-07-09T01:14:28Z)
Instruction Multi-Constraint Molecular Generation Using a Teacher-Student Large Language Model [49.64512917330373]
We introduce a multi-constraint molecular generation large language model, TSMMG, akin to a student. To train TSMMG, we construct a large set of text-molecule pairs by extracting molecular knowledge from these 'teachers' We experimentally show that TSMMG remarkably performs in generating molecules meeting complex, natural language-described property requirements.
arXiv Detail & Related papers (2024-03-20T02:15:55Z)
Interactive Molecular Discovery with Natural Language [69.89287960545903]
We propose the conversational molecular design, a novel task adopting natural language for describing and editing target molecules. To better accomplish this task, we design ChatMol, a knowledgeable and versatile generative pre-trained model, enhanced by injecting experimental property information.
arXiv Detail & Related papers (2023-06-21T02:05:48Z)
Empowering Molecule Discovery for Molecule-Caption Translation with Large Language Models: A ChatGPT Perspective [53.300288393173204]
Large Language Models (LLMs) have shown remarkable performance in various cross-modal tasks. In this work, we propose an In-context Few-Shot Molecule Learning paradigm for molecule-caption translation. We evaluate the effectiveness of MolReGPT on molecule-caption translation, including molecule understanding and text-based molecule generation.
arXiv Detail & Related papers (2023-06-11T08:16:25Z)
Domain-Agnostic Molecular Generation with Chemical Feedback [44.063584808910896]
MolGen is a pre-trained molecular language model tailored specifically for molecule generation. It internalizes structural and grammatical insights through the reconstruction of over 100 million molecular SELFIES. Our chemical feedback paradigm steers the model away from molecular hallucinations, ensuring alignment between the model's estimated probabilities and real-world chemical preferences.
arXiv Detail & Related papers (2023-01-26T17:52:56Z)
A Molecular Multimodal Foundation Model Associating Molecule Graphs with Natural Language [63.60376252491507]
We propose a molecular multimodal foundation model which is pretrained from molecular graphs and their semantically related textual data. We believe that our model would have a broad impact on AI-empowered fields across disciplines such as biology, chemistry, materials, environment, and medicine.
arXiv Detail & Related papers (2022-09-12T00:56:57Z)
Learning Latent Space Energy-Based Prior Model for Molecule Generation [59.875533935578375]
We learn latent space energy-based prior model with SMILES representation for molecule modeling. Our method is able to generate molecules with validity and uniqueness competitive with state-of-the-art models.
arXiv Detail & Related papers (2020-10-19T09:34:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.