3D-MolT5: Leveraging Discrete Structural Information for Molecule-Text Modeling
- URL: http://arxiv.org/abs/2406.05797v2
- Date: Tue, 18 Mar 2025 08:03:45 GMT
- Title: 3D-MolT5: Leveraging Discrete Structural Information for Molecule-Text Modeling
- Authors: Qizhi Pei, Rui Yan, Kaiyuan Gao, Jinhua Zhu, Lijun Wu,
- Abstract summary: We propose textbf3D-MolT5, a unified framework designed to model molecule in both sequence and 3D structure spaces.<n>Key innovation of our approach lies in mapping fine-grained 3D substructure representations into a specialized 3D token vocabulary.<n>Our approach significantly improves cross-modal interaction and alignment, addressing key challenges in previous work.
- Score: 41.07090635630771
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The integration of molecular and natural language representations has emerged as a focal point in molecular science, with recent advancements in Language Models (LMs) demonstrating significant potential for comprehensive modeling of both domains. However, existing approaches face notable limitations, particularly in their neglect of three-dimensional (3D) information, which is crucial for understanding molecular structures and functions. While some efforts have been made to incorporate 3D molecular information into LMs using external structure encoding modules, significant difficulties remain, such as insufficient interaction across modalities in pre-training and challenges in modality alignment. To address the limitations, we propose \textbf{3D-MolT5}, a unified framework designed to model molecule in both sequence and 3D structure spaces. The key innovation of our approach lies in mapping fine-grained 3D substructure representations into a specialized 3D token vocabulary. This methodology facilitates the seamless integration of sequence and structure representations in a tokenized format, enabling 3D-MolT5 to encode molecular sequences, molecular structures, and text sequences within a unified architecture. Leveraging this tokenized input strategy, we build a foundation model that unifies the sequence and structure data formats. We then conduct joint pre-training with multi-task objectives to enhance the model's comprehension of these diverse modalities within a shared representation space. Thus, our approach significantly improves cross-modal interaction and alignment, addressing key challenges in previous work. Further instruction tuning demonstrated that our 3D-MolT5 has strong generalization ability and surpasses existing methods with superior performance in multiple downstream tasks. Our code is available at https://github.com/QizhiPei/3D-MolT5.
Related papers
- Elucidating the Design Space of Multimodal Protein Language Models [69.3650883370033]
Multimodal protein language models (PLMs) integrate sequence and token-based structural information.
This paper systematically elucidates the design space of multimodal PLMs to overcome their limitations.
Our advancements approach finer-grained supervision, demonstrating that token-based multimodal PLMs can achieve robust structural modeling.
arXiv Detail & Related papers (2025-04-15T17:59:43Z) - Cross-Modal and Uncertainty-Aware Agglomeration for Open-Vocabulary 3D Scene Understanding [58.38294408121273]
We propose Cross-modal and Uncertainty-aware Agglomeration for Open-vocabulary 3D Scene Understanding dubbed CUA-O3D.
Our method addresses two key challenges: (1) incorporating semantic priors from VLMs alongside the geometric knowledge of spatially-aware vision foundation models, and (2) using a novel deterministic uncertainty estimation to capture model-specific uncertainties.
arXiv Detail & Related papers (2025-03-20T20:58:48Z) - Towards Unified Latent Space for 3D Molecular Latent Diffusion Modeling [80.59215359958934]
3D molecule generation is crucial for drug discovery and material science.
Existing approaches typically maintain separate latent spaces for invariant and equivariant modalities.
We propose a multi-modal VAE that compresses 3D molecules into latent sequences from a unified latent space.
arXiv Detail & Related papers (2025-03-19T08:56:13Z) - Tokenizing 3D Molecule Structure with Quantized Spherical Coordinates [28.452581855002855]
Mol-StrucTok is a novel method for tokenizing 3D molecular structures.
We design a line notation for 3D molecules by extracting local atomic coordinates in a spherical coordinate system.
We employ a Vector Quantized Variational Autoencoder (VQ-VAE) to tokenize these coordinates, treating them as generation descriptors.
arXiv Detail & Related papers (2024-12-02T14:50:44Z) - DPLM-2: A Multimodal Diffusion Protein Language Model [75.98083311705182]
We introduce DPLM-2, a multimodal protein foundation model that extends discrete diffusion protein language model (DPLM) to accommodate both sequences and structures.
DPLM-2 learns the joint distribution of sequence and structure, as well as their marginals and conditionals.
Empirical evaluation shows that DPLM-2 can simultaneously generate highly compatible amino acid sequences and their corresponding 3D structures.
arXiv Detail & Related papers (2024-10-17T17:20:24Z) - 3M-Diffusion: Latent Multi-Modal Diffusion for Language-Guided Molecular Structure Generation [18.55127917150268]
3M-Diffusion is a novel multi-modal molecular graph generation method.
It generates diverse, ideally novel molecular structures with desired properties.
arXiv Detail & Related papers (2024-03-11T21:44:54Z) - Sculpting Molecules in Text-3D Space: A Flexible Substructure Aware Framework for Text-Oriented Molecular Optimization [10.868749018881115]
We present an innovative approach to tackle the inverse design problem by formulating it as a multi-modality guidance optimization task.
Our proposed solution involves a textural-structure alignment symmetric diffusion framework for the implementation of molecular optimization tasks, namely 3DToMolo.
3DToMolo aims to harmonize diverse modalities including textual description features and graph structural features, aligning them seamlessly to produce molecular structures adhere to specified symmetric structural and textural constraints by experts in the field.
arXiv Detail & Related papers (2024-03-06T03:15:25Z) - Towards 3D Molecule-Text Interpretation in Language Models [125.56693661827181]
3D-MoLM enables an LM to interpret and analyze 3D molecules by equipping the LM with a 3D molecular encoder.
This integration is achieved by a 3D molecule-text projector, bridging the 3D molecular encoder's representation space and the LM's input space.
We meticulously curated a 3D molecule-centric instruction tuning dataset -- 3D-MoIT.
arXiv Detail & Related papers (2024-01-25T03:42:00Z) - 3D-Mol: A Novel Contrastive Learning Framework for Molecular Property Prediction with 3D Information [1.1777304970289215]
3D-Mol is a novel approach designed for more accurate spatial structure representation.
It deconstructs molecules into three hierarchical graphs to better extract geometric information.
We compare 3D-Mol with various state-of-the-art baselines on 7 benchmarks and demonstrate our outstanding performance.
arXiv Detail & Related papers (2023-09-28T10:05:37Z) - Automated 3D Pre-Training for Molecular Property Prediction [54.15788181794094]
We propose a novel 3D pre-training framework (dubbed 3D PGT)
It pre-trains a model on 3D molecular graphs, and then fine-tunes it on molecular graphs without 3D structures.
Extensive experiments on 2D molecular graphs are conducted to demonstrate the accuracy, efficiency and generalization ability of the proposed 3D PGT.
arXiv Detail & Related papers (2023-06-13T14:43:13Z) - Generation of 3D Molecules in Pockets via Language Model [0.0]
Generative models for molecules based on sequential line notation (e.g. SMILES) or graph representation have attracted an increasing interest in the field of structure-based drug design.
We introduce Lingo3DMol, a pocket-based 3D molecule generation method that combines language models and geometric deep learning technology.
arXiv Detail & Related papers (2023-05-17T11:31:06Z) - Language models can generate molecules, materials, and protein binding
sites directly in three dimensions as XYZ, CIF, and PDB files [0.0]
Language models are powerful tools for molecular design.
We show how language models can generate novel and valid structures in three dimensions.
Despite being trained on chemical file sequences, language models still achieve performance comparable to state-of-the-art models.
arXiv Detail & Related papers (2023-05-09T18:35:38Z) - MUDiff: Unified Diffusion for Complete Molecule Generation [104.7021929437504]
We present a new model for generating a comprehensive representation of molecules, including atom features, 2D discrete molecule structures, and 3D continuous molecule coordinates.
We propose a novel graph transformer architecture to denoise the diffusion process.
Our model is a promising approach for designing stable and diverse molecules and can be applied to a wide range of tasks in molecular modeling.
arXiv Detail & Related papers (2023-04-28T04:25:57Z) - An Equivariant Generative Framework for Molecular Graph-Structure
Co-Design [54.92529253182004]
We present MolCode, a machine learning-based generative framework for underlineMolecular graph-structure underlineCo-design.
In MolCode, 3D geometric information empowers the molecular 2D graph generation, which in turn helps guide the prediction of molecular 3D structure.
Our investigation reveals that the 2D topology and 3D geometry contain intrinsically complementary information in molecule design.
arXiv Detail & Related papers (2023-04-12T13:34:22Z) - Scalable Fragment-Based 3D Molecular Design with Reinforcement Learning [68.8204255655161]
We introduce a novel framework for scalable 3D design that uses a hierarchical agent to build molecules.
In a variety of experiments, we show that our agent, guided only by energy considerations, can efficiently learn to produce molecules with over 100 atoms.
arXiv Detail & Related papers (2022-02-01T18:54:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.