3D-MolT5: Towards Unified 3D Molecule-Text Modeling with 3D Molecular Tokenization
- URL: http://arxiv.org/abs/2406.05797v1
- Date: Sun, 9 Jun 2024 14:20:55 GMT
- Title: 3D-MolT5: Towards Unified 3D Molecule-Text Modeling with 3D Molecular Tokenization
- Authors: Qizhi Pei, Lijun Wu, Kaiyuan Gao, Jinhua Zhu, Rui Yan,
- Abstract summary: 3D-MolT5 is a unified framework designed to model both 1D molecular sequence and 3D molecular structure.
Key innovation lies in our methodology for mapping fine-grained 3D substructure representations to a specialized 3D token vocabulary.
Our proposed 3D-MolT5 shows superior performance than existing methods in molecular property prediction, molecule captioning, and text-based molecule generation tasks.
- Score: 41.07090635630771
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The integration of molecule and language has garnered increasing attention in molecular science. Recent advancements in Language Models (LMs) have demonstrated potential for the comprehensive modeling of molecule and language. However, existing works exhibit notable limitations. Most existing works overlook the modeling of 3D information, which is crucial for understanding molecular structures and also functions. While some attempts have been made to leverage external structure encoding modules to inject the 3D molecular information into LMs, there exist obvious difficulties that hinder the integration of molecular structure and language text, such as modality alignment and separate tuning. To bridge this gap, we propose 3D-MolT5, a unified framework designed to model both 1D molecular sequence and 3D molecular structure. The key innovation lies in our methodology for mapping fine-grained 3D substructure representations (based on 3D molecular fingerprints) to a specialized 3D token vocabulary for 3D-MolT5. This 3D structure token vocabulary enables the seamless combination of 1D sequence and 3D structure representations in a tokenized format, allowing 3D-MolT5 to encode molecular sequence (SELFIES), molecular structure, and text sequences within a unified architecture. Alongside, we further introduce 1D and 3D joint pre-training to enhance the model's comprehension of these diverse modalities in a joint representation space and better generalize to various tasks for our foundation model. Through instruction tuning on multiple downstream datasets, our proposed 3D-MolT5 shows superior performance than existing methods in molecular property prediction, molecule captioning, and text-based molecule generation tasks. Our code will be available on GitHub soon.
Related papers
- Elucidating the Design Space of Multimodal Protein Language Models [69.3650883370033]
Multimodal protein language models (PLMs) integrate sequence and token-based structural information.
This paper systematically elucidates the design space of multimodal PLMs to overcome their limitations.
Our advancements approach finer-grained supervision, demonstrating that token-based multimodal PLMs can achieve robust structural modeling.
arXiv Detail & Related papers (2025-04-15T17:59:43Z) - Cross-Modal and Uncertainty-Aware Agglomeration for Open-Vocabulary 3D Scene Understanding [58.38294408121273]
We propose Cross-modal and Uncertainty-aware Agglomeration for Open-vocabulary 3D Scene Understanding dubbed CUA-O3D.
Our method addresses two key challenges: (1) incorporating semantic priors from VLMs alongside the geometric knowledge of spatially-aware vision foundation models, and (2) using a novel deterministic uncertainty estimation to capture model-specific uncertainties.
arXiv Detail & Related papers (2025-03-20T20:58:48Z) - Towards Unified Latent Space for 3D Molecular Latent Diffusion Modeling [80.59215359958934]
3D molecule generation is crucial for drug discovery and material science.
Existing approaches typically maintain separate latent spaces for invariant and equivariant modalities.
We propose a multi-modal VAE that compresses 3D molecules into latent sequences from a unified latent space.
arXiv Detail & Related papers (2025-03-19T08:56:13Z) - Tokenizing 3D Molecule Structure with Quantized Spherical Coordinates [28.452581855002855]
Mol-StrucTok is a novel method for tokenizing 3D molecular structures.
We design a line notation for 3D molecules by extracting local atomic coordinates in a spherical coordinate system.
We employ a Vector Quantized Variational Autoencoder (VQ-VAE) to tokenize these coordinates, treating them as generation descriptors.
arXiv Detail & Related papers (2024-12-02T14:50:44Z) - DPLM-2: A Multimodal Diffusion Protein Language Model [75.98083311705182]
We introduce DPLM-2, a multimodal protein foundation model that extends discrete diffusion protein language model (DPLM) to accommodate both sequences and structures.
DPLM-2 learns the joint distribution of sequence and structure, as well as their marginals and conditionals.
Empirical evaluation shows that DPLM-2 can simultaneously generate highly compatible amino acid sequences and their corresponding 3D structures.
arXiv Detail & Related papers (2024-10-17T17:20:24Z) - 3M-Diffusion: Latent Multi-Modal Diffusion for Language-Guided Molecular Structure Generation [18.55127917150268]
3M-Diffusion is a novel multi-modal molecular graph generation method.
It generates diverse, ideally novel molecular structures with desired properties.
arXiv Detail & Related papers (2024-03-11T21:44:54Z) - Sculpting Molecules in Text-3D Space: A Flexible Substructure Aware Framework for Text-Oriented Molecular Optimization [10.868749018881115]
We present an innovative approach to tackle the inverse design problem by formulating it as a multi-modality guidance optimization task.
Our proposed solution involves a textural-structure alignment symmetric diffusion framework for the implementation of molecular optimization tasks, namely 3DToMolo.
3DToMolo aims to harmonize diverse modalities including textual description features and graph structural features, aligning them seamlessly to produce molecular structures adhere to specified symmetric structural and textural constraints by experts in the field.
arXiv Detail & Related papers (2024-03-06T03:15:25Z) - Towards 3D Molecule-Text Interpretation in Language Models [125.56693661827181]
3D-MoLM enables an LM to interpret and analyze 3D molecules by equipping the LM with a 3D molecular encoder.
This integration is achieved by a 3D molecule-text projector, bridging the 3D molecular encoder's representation space and the LM's input space.
We meticulously curated a 3D molecule-centric instruction tuning dataset -- 3D-MoIT.
arXiv Detail & Related papers (2024-01-25T03:42:00Z) - 3D-Mol: A Novel Contrastive Learning Framework for Molecular Property Prediction with 3D Information [1.1777304970289215]
3D-Mol is a novel approach designed for more accurate spatial structure representation.
It deconstructs molecules into three hierarchical graphs to better extract geometric information.
We compare 3D-Mol with various state-of-the-art baselines on 7 benchmarks and demonstrate our outstanding performance.
arXiv Detail & Related papers (2023-09-28T10:05:37Z) - Automated 3D Pre-Training for Molecular Property Prediction [54.15788181794094]
We propose a novel 3D pre-training framework (dubbed 3D PGT)
It pre-trains a model on 3D molecular graphs, and then fine-tunes it on molecular graphs without 3D structures.
Extensive experiments on 2D molecular graphs are conducted to demonstrate the accuracy, efficiency and generalization ability of the proposed 3D PGT.
arXiv Detail & Related papers (2023-06-13T14:43:13Z) - Generation of 3D Molecules in Pockets via Language Model [0.0]
Generative models for molecules based on sequential line notation (e.g. SMILES) or graph representation have attracted an increasing interest in the field of structure-based drug design.
We introduce Lingo3DMol, a pocket-based 3D molecule generation method that combines language models and geometric deep learning technology.
arXiv Detail & Related papers (2023-05-17T11:31:06Z) - Language models can generate molecules, materials, and protein binding
sites directly in three dimensions as XYZ, CIF, and PDB files [0.0]
Language models are powerful tools for molecular design.
We show how language models can generate novel and valid structures in three dimensions.
Despite being trained on chemical file sequences, language models still achieve performance comparable to state-of-the-art models.
arXiv Detail & Related papers (2023-05-09T18:35:38Z) - MUDiff: Unified Diffusion for Complete Molecule Generation [104.7021929437504]
We present a new model for generating a comprehensive representation of molecules, including atom features, 2D discrete molecule structures, and 3D continuous molecule coordinates.
We propose a novel graph transformer architecture to denoise the diffusion process.
Our model is a promising approach for designing stable and diverse molecules and can be applied to a wide range of tasks in molecular modeling.
arXiv Detail & Related papers (2023-04-28T04:25:57Z) - An Equivariant Generative Framework for Molecular Graph-Structure
Co-Design [54.92529253182004]
We present MolCode, a machine learning-based generative framework for underlineMolecular graph-structure underlineCo-design.
In MolCode, 3D geometric information empowers the molecular 2D graph generation, which in turn helps guide the prediction of molecular 3D structure.
Our investigation reveals that the 2D topology and 3D geometry contain intrinsically complementary information in molecule design.
arXiv Detail & Related papers (2023-04-12T13:34:22Z) - Scalable Fragment-Based 3D Molecular Design with Reinforcement Learning [68.8204255655161]
We introduce a novel framework for scalable 3D design that uses a hierarchical agent to build molecules.
In a variety of experiments, we show that our agent, guided only by energy considerations, can efficiently learn to produce molecules with over 100 atoms.
arXiv Detail & Related papers (2022-02-01T18:54:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.