Related papers: Atomas: Hierarchical Alignment on Molecule-Text for Unified Molecule Understanding and Generation

Atomas: Hierarchical Alignment on Molecule-Text for Unified Molecule Understanding and Generation

URL: http://arxiv.org/abs/2404.16880v1
Date: Tue, 23 Apr 2024 12:35:44 GMT
Title: Atomas: Hierarchical Alignment on Molecule-Text for Unified Molecule Understanding and Generation
Authors: Yikun Zhang, Geyan Ye, Chaohao Yuan, Bo Han, Long-Kai Huang, Jianhua Yao, Wei Liu, Yu Rong,
Abstract summary: We propose Atomas, a multi-modal molecular representation learning framework to jointly learn representations from SMILES string and text. In the retrieval task, Atomas exhibits robust generalization ability and outperforms the baseline by 30.8% of recall@1 on average. In the generation task, Atomas achieves state-of-the-art results in both molecule captioning task and molecule generation task.
Score: 42.08917809689811
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Molecule-and-text cross-modal representation learning has emerged as a promising direction for enhancing the quality of molecular representation, thereby improving performance in various scientific fields, including drug discovery and materials science. Existing studies adopt a global alignment approach to learn the knowledge from different modalities. These global alignment approaches fail to capture fine-grained information, such as molecular fragments and their corresponding textual description, which is crucial for downstream tasks. Furthermore, it is incapable to model such information using a similar global alignment strategy due to data scarcity of paired local part annotated data from existing datasets. In this paper, we propose Atomas, a multi-modal molecular representation learning framework to jointly learn representations from SMILES string and text. We design a Hierarchical Adaptive Alignment model to concurrently learn the fine-grained fragment correspondence between two modalities and align these representations of fragments in three levels. Additionally, Atomas's end-to-end training framework incorporates the tasks of understanding and generating molecule, thereby supporting a wider range of downstream tasks. In the retrieval task, Atomas exhibits robust generalization ability and outperforms the baseline by 30.8% of recall@1 on average. In the generation task, Atomas achieves state-of-the-art results in both molecule captioning task and molecule generation task. Moreover, the visualization of the Hierarchical Adaptive Alignment model further confirms the chemical significance of our approach. Our codes can be found at https://anonymous.4open.science/r/Atomas-03C3.

Related papers

Hierarchical Molecular Representation Learning via Fragment-Based Self-Supervised Embedding Prediction [7.459632891054827]
Graph self-supervised learning (GSSL) has demonstrated strong potential for generating expressive graph embeddings without the need for human annotations.<n>GraSPNet is a hierarchical self-supervised framework that explicitly models both atomic-level and fragment-level semantics.<n>GraSPNet learns chemically meaningful representations and consistently outperforms state-of-the-art GSSL methods in transfer learning settings.
arXiv Detail & Related papers (2026-02-23T20:41:44Z)
Entropy-Guided Dynamic Tokens for Graph-LLM Alignment in Molecular Understanding [13.814119721533508]
Molecular understanding is central to advancing areas such as scientific discovery.<n>Existing graph-LLM bridges often adapt the Q-Former-style connector with fixed-length static tokens.<n>We introduce EDT-Former, an Entropy-guided Dynamic Token Transformer that generates tokens aligned with informative molecular patches.
arXiv Detail & Related papers (2026-02-02T19:56:21Z)
ProtoMol: Enhancing Molecular Property Prediction via Prototype-Guided Multimodal Learning [14.289447310645878]
ProtoMol is a prototype-guided framework that enables fine-grained integration and consistent semantic alignment between modalities.<n>ProtoMol consistently outperforms state-of-the-art baselines across a variety of molecular property prediction tasks.
arXiv Detail & Related papers (2025-10-19T13:19:37Z)
Training Text-to-Molecule Models with Context-Aware Tokenization [48.35188892892129]
We propose a novel text-to-molecule model, coined Context-Aware Molecular T5 (CAMT5)<n>Inspired by the significance of the substructure-level contexts in understanding molecule structures, we introduce substructure-level tokenization for text-to-molecule models.<n>We develop an importance-based training strategy that prioritizes key substructures, enabling CAMT5 to better capture the molecular semantics.
arXiv Detail & Related papers (2025-08-30T07:59:02Z)
Molecular Machine Learning Using Euler Characteristic Transforms [12.108680020079925]
Shape of a molecule determines its physicochemical and biological properties.<n>We propose using the Euler Characteristic Transform (ECT) as a geometrical-topological descriptor.<n>ECT enables the extraction of multiscale structural features, offering a novel way to represent and encode molecular shape in the feature space.
arXiv Detail & Related papers (2025-07-04T10:57:40Z)
TRIDENT: Tri-Modal Molecular Representation Learning with Taxonomic Annotations and Local Correspondence [33.9788667629578]
TRIDENT is a novel framework that integrates molecular SMILES, textual descriptions, and taxonomic functional annotations to learn rich molecular representations.<n> TRIDENT achieves state-of-the-art performance on 11 downstream tasks.
arXiv Detail & Related papers (2025-06-26T06:09:47Z)
Graph-based Molecular In-context Learning Grounded on Morgan Fingerprints [28.262593876388397]
In-context learning (ICL) conditions large language models (LLMs) for molecular tasks, such as property prediction and molecule captioning, by embedding carefully selected demonstration examples into the input prompt. However, current prompt retrieval methods for molecular tasks have relied on molecule feature similarity, such as Morgan fingerprints, which do not adequately capture the global molecular and atom-binding relationships. We propose a self-supervised learning technique, GAMIC, which aligns global molecular structures, represented by graph neural networks (GNNs), with textual captions (descriptions) while leveraging local feature similarity through Morgan fingerprints.
arXiv Detail & Related papers (2025-02-08T02:46:33Z)
GeomCLIP: Contrastive Geometry-Text Pre-training for Molecules [16.98169256565552]
We set up a data collection effort for 200K pairs of ground-state geometric structures and biomedical texts. We propose the GeomCLIP framework to enhance for multi-modal representation learning from molecular structures and biomedical text.
arXiv Detail & Related papers (2024-11-16T15:15:24Z)
Exploring Optimal Transport-Based Multi-Grained Alignments for Text-Molecule Retrieval [24.061535843472427]
We introduce the Optimal TRansport-based Multi-grained Alignments model (ORMA) ORMA is a novel approach that facilitates multi-grained alignments between textual descriptions and molecules. Experimental results on the ChEBI-20 and PCdes datasets demonstrate that ORMA significantly outperforms existing state-of-the-art (SOTA) models.
arXiv Detail & Related papers (2024-11-04T06:30:52Z)
Molecular Graph Representation Learning Integrating Large Language Models with Domain-specific Small Models [12.744381867301353]
We propose a novel Molecular Graph representation learning framework that integrates Large language models and Domain-specific small models. We employ a multi-modal alignment method to coordinate various modalities, including molecular graphs and their corresponding descriptive texts, to guide the pre-training of molecular representations.
arXiv Detail & Related papers (2024-08-19T16:11:59Z)
UniIF: Unified Molecule Inverse Folding [67.60267592514381]
We propose a unified model UniIF for inverse folding of all molecules. Our proposed method surpasses state-of-the-art methods on all tasks.
arXiv Detail & Related papers (2024-05-29T10:26:16Z)
Adapting Differential Molecular Representation with Hierarchical Prompts for Multi-label Property Prediction [2.344198904343022]
HiPM stands for hierarchical prompted molecular representation learning framework. Our framework comprises two core components: the Molecular Representation (MRE) and the Task-Aware Prompter (TAP)
arXiv Detail & Related papers (2024-05-29T03:10:21Z)
Text2Data: Low-Resource Data Generation with Textual Control [100.5970757736845]
Text2Data is a novel approach that utilizes unlabeled data to understand the underlying data distribution. It undergoes finetuning via a novel constraint optimization-based learning objective that ensures controllability and effectively counteracts catastrophic forgetting.
arXiv Detail & Related papers (2024-02-08T03:41:39Z)
Learning Over Molecular Conformer Ensembles: Datasets and Benchmarks [44.934084652800976]
We introduce the first MoleculAR Conformer Ensemble Learning benchmark to thoroughly evaluate the potential of learning on conformer ensembles. Our findings reveal that direct learning from an conformer space can improve performance on a variety of tasks and models.
arXiv Detail & Related papers (2023-09-29T20:06:46Z)
Unified Molecular Modeling via Modality Blending [35.16755562674055]
We introduce a novel "blend-then-predict" self-supervised learning method (MoleBLEND) MoleBLEND blends atom relations from different modalities into one unified relation for matrix encoding, then recovers modality-specific information for both 2D and 3D structures. Experiments show that MoleBLEND achieves state-of-the-art performance across major 2D/3D benchmarks.
arXiv Detail & Related papers (2023-07-12T15:27:06Z)
Bi-level Contrastive Learning for Knowledge-Enhanced Molecule Representations [55.42602325017405]
We propose a novel method called GODE, which takes into account the two-level structure of individual molecules. By pre-training two graph neural networks (GNNs) on different graph structures, combined with contrastive learning, GODE fuses molecular structures with their corresponding knowledge graph substructures. When fine-tuned across 11 chemical property tasks, our model outperforms existing benchmarks, registering an average ROC-AUC uplift of 13.8% for classification tasks and an average RMSE/MAE enhancement of 35.1% for regression tasks.
arXiv Detail & Related papers (2023-06-02T15:49:45Z)
GIMLET: A Unified Graph-Text Model for Instruction-Based Molecule Zero-Shot Learning [71.89623260998934]
This study investigates the feasibility of employing natural language instructions to accomplish molecule-related tasks in a zero-shot setting. Existing molecule-text models perform poorly in this setting due to inadequate treatment of instructions and limited capacity for graphs. We propose GIMLET, which unifies language models for both graph and text data.
arXiv Detail & Related papers (2023-05-28T18:27:59Z)
Generation of 3D Molecules in Pockets via Language Model [0.0]
Generative models for molecules based on sequential line notation (e.g. SMILES) or graph representation have attracted an increasing interest in the field of structure-based drug design. We introduce Lingo3DMol, a pocket-based 3D molecule generation method that combines language models and geometric deep learning technology.
arXiv Detail & Related papers (2023-05-17T11:31:06Z)
A Molecular Multimodal Foundation Model Associating Molecule Graphs with Natural Language [63.60376252491507]
We propose a molecular multimodal foundation model which is pretrained from molecular graphs and their semantically related textual data. We believe that our model would have a broad impact on AI-empowered fields across disciplines such as biology, chemistry, materials, environment, and medicine.
arXiv Detail & Related papers (2022-09-12T00:56:57Z)
ASGN: An Active Semi-supervised Graph Neural Network for Molecular Property Prediction [61.33144688400446]
We propose a novel framework called Active Semi-supervised Graph Neural Network (ASGN) by incorporating both labeled and unlabeled molecules. In the teacher model, we propose a novel semi-supervised learning method to learn general representation that jointly exploits information from molecular structure and molecular distribution. At last, we proposed a novel active learning strategy in terms of molecular diversities to select informative data during the whole framework learning.
arXiv Detail & Related papers (2020-07-07T04:22:39Z)
Self-Supervised Graph Transformer on Large-Scale Molecular Data [73.3448373618865]
We propose a novel framework, GROVER, for molecular representation learning. GROVER can learn rich structural and semantic information of molecules from enormous unlabelled molecular data. We pre-train GROVER with 100 million parameters on 10 million unlabelled molecules -- the biggest GNN and the largest training dataset in molecular representation learning.
arXiv Detail & Related papers (2020-06-18T08:37:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.