Related papers: MolTRES: Improving Chemical Language Representation Learning for Molecular Property Prediction

MolTRES: Improving Chemical Language Representation Learning for Molecular Property Prediction

URL: http://arxiv.org/abs/2408.01426v1
Date: Tue, 9 Jul 2024 01:14:28 GMT
Title: MolTRES: Improving Chemical Language Representation Learning for Molecular Property Prediction
Authors: Jun-Hyung Park, Yeachan Kim, Mingyu Lee, Hyuntae Park, SangKeun Lee,
Abstract summary: MolTRES is a novel chemical language representation learning framework. It incorporates generator-discriminator training, allowing the model to learn from more challenging examples. Our model outperforms existing state-of-the-art models on popular molecular property prediction tasks.
Score: 14.353313239109337
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Chemical representation learning has gained increasing interest due to the limited availability of supervised data in fields such as drug and materials design. This interest particularly extends to chemical language representation learning, which involves pre-training Transformers on SMILES sequences -- textual descriptors of molecules. Despite its success in molecular property prediction, current practices often lead to overfitting and limited scalability due to early convergence. In this paper, we introduce a novel chemical language representation learning framework, called MolTRES, to address these issues. MolTRES incorporates generator-discriminator training, allowing the model to learn from more challenging examples that require structural understanding. In addition, we enrich molecular representations by transferring knowledge from scientific literature by integrating external materials embedding. Experimental results show that our model outperforms existing state-of-the-art models on popular molecular property prediction tasks.

Related papers

Agentic reinforcement learning empowers next-generation chemical language models for molecular design and synthesis [51.83339196548892]
ChemCraft is a novel framework that decouples chemical reasoning from knowledge storage.<n>ChemCraft achieves superior performance with minimal inference costs.<n>This work establishes a cost-effective and privacy-preserving paradigm for AI-aided chemistry.
arXiv Detail & Related papers (2026-01-25T04:23:34Z)
Unveiling Latent Knowledge in Chemistry Language Models through Sparse Autoencoders [42.033443425253644]
We extend sparse autoencoder techniques to uncover and examine interpretable features within chemistry language models.<n>Our findings reveal that these models encode a rich landscape of chemical concepts.<n>Our approach provides a generalisable framework for uncovering latent knowledge in chemistry-focused AI systems.
arXiv Detail & Related papers (2025-12-08T22:20:01Z)
$\ ext{M}^{2}$LLM: Multi-view Molecular Representation Learning with Large Language Models [59.125833618091846]
We propose a multi-view framework that integrates three perspectives: the molecular structure view, the molecular task view, and the molecular rules view.<n>Experiments demonstrate that $textM2$LLM achieves state-of-the-art performance on multiple benchmarks across classification and regression tasks.
arXiv Detail & Related papers (2025-08-12T05:46:47Z)
Knowledge-aware contrastive heterogeneous molecular graph learning [77.94721384862699]
We propose a paradigm shift by encoding molecular graphs into Heterogeneous Molecular Graph Learning (KCHML) KCHML conceptualizes molecules through three distinct graph views-molecular, elemental, and pharmacological-enhanced by heterogeneous molecular graphs and a dual message-passing mechanism. This design offers a comprehensive representation for property prediction, as well as for downstream tasks such as drug-drug interaction (DDI) prediction.
arXiv Detail & Related papers (2025-02-17T11:53:58Z)
Pre-trained Molecular Language Models with Random Functional Group Masking [54.900360309677794]
We propose a SMILES-based underlineem Molecular underlineem Language underlineem Model, which randomly masking SMILES subsequences corresponding to specific molecular atoms. This technique aims to compel the model to better infer molecular structures and properties, thus enhancing its predictive capabilities.
arXiv Detail & Related papers (2024-11-03T01:56:15Z)
Instruction Multi-Constraint Molecular Generation Using a Teacher-Student Large Language Model [49.64512917330373]
We introduce a multi-constraint molecular generation large language model, TSMMG, akin to a student. To train TSMMG, we construct a large set of text-molecule pairs by extracting molecular knowledge from these 'teachers' We experimentally show that TSMMG remarkably performs in generating molecules meeting complex, natural language-described property requirements.
arXiv Detail & Related papers (2024-03-20T02:15:55Z)
MultiModal-Learning for Predicting Molecular Properties: A Framework Based on Image and Graph Structures [2.5563339057415218]
MolIG is a novel MultiModaL molecular pre-training framework for predicting molecular properties based on Image and Graph structures. It amalgamates the strengths of both molecular representation forms. It exhibits enhanced performance in downstream tasks pertaining to molecular property prediction within benchmark groups.
arXiv Detail & Related papers (2023-11-28T10:28:35Z)
MolCAP: Molecular Chemical reActivity pretraining and prompted-finetuning enhanced molecular representation learning [3.179128580341411]
MolCAP is a graph pretraining Transformer based on chemical reactivity (IMR) knowledge with prompted finetuning. Prompted by MolCAP, even basic graph neural networks are capable of achieving surprising performance that outperforms previous models.
arXiv Detail & Related papers (2023-06-13T13:48:06Z)
Empowering Molecule Discovery for Molecule-Caption Translation with Large Language Models: A ChatGPT Perspective [53.300288393173204]
Large Language Models (LLMs) have shown remarkable performance in various cross-modal tasks. In this work, we propose an In-context Few-Shot Molecule Learning paradigm for molecule-caption translation. We evaluate the effectiveness of MolReGPT on molecule-caption translation, including molecule understanding and text-based molecule generation.
arXiv Detail & Related papers (2023-06-11T08:16:25Z)
MolXPT: Wrapping Molecules with Text for Generative Pre-training [141.0924452870112]
MolXPT is a unified language model of text and molecules pre-trained on SMILES wrapped by text. MolXPT outperforms strong baselines of molecular property prediction on MoleculeNet.
arXiv Detail & Related papers (2023-05-18T03:58:19Z)
Domain-Agnostic Molecular Generation with Chemical Feedback [44.063584808910896]
MolGen is a pre-trained molecular language model tailored specifically for molecule generation. It internalizes structural and grammatical insights through the reconstruction of over 100 million molecular SELFIES. Our chemical feedback paradigm steers the model away from molecular hallucinations, ensuring alignment between the model's estimated probabilities and real-world chemical preferences.
arXiv Detail & Related papers (2023-01-26T17:52:56Z)
MolCPT: Molecule Continuous Prompt Tuning to Generalize Molecular Representation Learning [77.31492888819935]
We propose a novel paradigm of "pre-train, prompt, fine-tune" for molecular representation learning, named molecule continuous prompt tuning (MolCPT) MolCPT defines a motif prompting function that uses the pre-trained model to project the standalone input into an expressive prompt. Experiments on several benchmark datasets show that MolCPT efficiently generalizes pre-trained GNNs for molecular property prediction.
arXiv Detail & Related papers (2022-12-20T19:32:30Z)
A Molecular Multimodal Foundation Model Associating Molecule Graphs with Natural Language [63.60376252491507]
We propose a molecular multimodal foundation model which is pretrained from molecular graphs and their semantically related textual data. We believe that our model would have a broad impact on AI-empowered fields across disciplines such as biology, chemistry, materials, environment, and medicine.
arXiv Detail & Related papers (2022-09-12T00:56:57Z)
KPGT: Knowledge-Guided Pre-training of Graph Transformer for Molecular Property Prediction [13.55018269009361]
We introduce Knowledge-guided Pre-training of Graph Transformer (KPGT), a novel self-supervised learning framework for molecular graph representation learning. KPGT can offer superior performance over current state-of-the-art methods on several molecular property prediction tasks.
arXiv Detail & Related papers (2022-06-02T08:22:14Z)
Do Large Scale Molecular Language Representations Capture Important Structural Information? [31.76876206167457]
We present molecular embeddings obtained by training an efficient transformer encoder model, referred to as MoLFormer. Experiments show that the learned molecular representation performs competitively, when compared to graph-based and fingerprint-based supervised learning baselines.
arXiv Detail & Related papers (2021-06-17T14:33:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.