Unified Molecule Pre-training with Flexible 2D and 3D Modalities: Single and Paired Modality Integration
- URL: http://arxiv.org/abs/2510.07035v1
- Date: Wed, 08 Oct 2025 14:02:51 GMT
- Title: Unified Molecule Pre-training with Flexible 2D and 3D Modalities: Single and Paired Modality Integration
- Authors: Tengwei Song, Min Wu, Yuan Fang,
- Abstract summary: We propose FlexMol, a flexible molecule pre-training framework that learns unified molecular representations while supporting single-modality input.<n>Our approach employs separate models for 2D and 3D molecular data, leverages parameter sharing to improve computational efficiency, and utilizes a decoder to generate features for the missing modality.
- Score: 15.929511077091687
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Molecular representation learning plays a crucial role in advancing applications such as drug discovery and material design. Existing work leverages 2D and 3D modalities of molecular information for pre-training, aiming to capture comprehensive structural and geometric insights. However, these methods require paired 2D and 3D molecular data to train the model effectively and prevent it from collapsing into a single modality, posing limitations in scenarios where a certain modality is unavailable or computationally expensive to generate. To overcome this limitation, we propose FlexMol, a flexible molecule pre-training framework that learns unified molecular representations while supporting single-modality input. Specifically, inspired by the unified structure in vision-language models, our approach employs separate models for 2D and 3D molecular data, leverages parameter sharing to improve computational efficiency, and utilizes a decoder to generate features for the missing modality. This enables a multistage continuous learning process where both modalities contribute collaboratively during training, while ensuring robustness when only one modality is available during inference. Extensive experiments demonstrate that FlexMol achieves superior performance across a wide range of molecular property prediction tasks, and we also empirically demonstrate its effectiveness with incomplete data. Our code and data are available at https://github.com/tewiSong/FlexMol.
Related papers
- UniMo: Unifying 2D Video and 3D Human Motion with an Autoregressive Framework [54.337290937468175]
We propose UniMo, an autoregressive model for joint modeling of 2D human videos and 3D human motions within a unified framework.<n>We show that our method simultaneously generates corresponding videos and motions while performing accurate motion capture.
arXiv Detail & Related papers (2025-12-03T16:03:18Z) - Foundation Model for Skeleton-Based Human Action Understanding [56.89025287217221]
This paper presents a Unified Skeleton-based Dense Representation Learning framework.<n>USDRL consists of a Transformer-based Dense Spatio-Temporal (DSTE), Multi-Grained Feature Decorrelation (MG-FD), and Multi-Perspective Consistency Training (MPCT)
arXiv Detail & Related papers (2025-08-18T02:42:16Z) - MolMix: A Simple Yet Effective Baseline for Multimodal Molecular Representation Learning [17.93173928602627]
We propose a simple transformer-based baseline for multimodal molecular representation learning.
We integrate three distinct modalities: SMILES strings, 2D graph representations, and 3D conformers of molecules.
Despite its simplicity, our approach achieves state-of-the-art results across multiple datasets.
arXiv Detail & Related papers (2024-10-10T14:36:58Z) - 3D-MolT5: Leveraging Discrete Structural Information for Molecule-Text Modeling [41.07090635630771]
We propose textbf3D-MolT5, a unified framework designed to model molecule in both sequence and 3D structure spaces.<n>Key innovation of our approach lies in mapping fine-grained 3D substructure representations into a specialized 3D token vocabulary.<n>Our approach significantly improves cross-modal interaction and alignment, addressing key challenges in previous work.
arXiv Detail & Related papers (2024-06-09T14:20:55Z) - MolBind: Multimodal Alignment of Language, Molecules, and Proteins [16.98169256565552]
MolBind is a framework that trains encoders for multiple modalities through contrastive learning.
MolBind shows superior zero-shot learning performance across a wide range of tasks.
arXiv Detail & Related papers (2024-03-13T01:38:42Z) - Integrating Chemical Language and Molecular Graph in Multimodal Fused Deep Learning for Drug Property Prediction [9.388979080270103]
We construct multimodal deep learning models to cover different molecular representations.
Compared with mono-modal models, our multimodal fused deep learning (MMFDL) models outperform single models in accuracy, reliability, and resistance capability against noise.
arXiv Detail & Related papers (2023-12-29T07:19:42Z) - Unified Molecular Modeling via Modality Blending [35.16755562674055]
We introduce a novel "blend-then-predict" self-supervised learning method (MoleBLEND)
MoleBLEND blends atom relations from different modalities into one unified relation for matrix encoding, then recovers modality-specific information for both 2D and 3D structures.
Experiments show that MoleBLEND achieves state-of-the-art performance across major 2D/3D benchmarks.
arXiv Detail & Related papers (2023-07-12T15:27:06Z) - MUDiff: Unified Diffusion for Complete Molecule Generation [104.7021929437504]
We present a new model for generating a comprehensive representation of molecules, including atom features, 2D discrete molecule structures, and 3D continuous molecule coordinates.
We propose a novel graph transformer architecture to denoise the diffusion process.
Our model is a promising approach for designing stable and diverse molecules and can be applied to a wide range of tasks in molecular modeling.
arXiv Detail & Related papers (2023-04-28T04:25:57Z) - Implicit Geometry and Interaction Embeddings Improve Few-Shot Molecular
Property Prediction [53.06671763877109]
We develop molecular embeddings that encode complex molecular characteristics to improve the performance of few-shot molecular property prediction.
Our approach leverages large amounts of synthetic data, namely the results of molecular docking calculations.
On multiple molecular property prediction benchmarks, training from the embedding space substantially improves Multi-Task, MAML, and Prototypical Network few-shot learning performance.
arXiv Detail & Related papers (2023-02-04T01:32:40Z) - Dynamic Latent Separation for Deep Learning [67.62190501599176]
A core problem in machine learning is to learn expressive latent variables for model prediction on complex data.
Here, we develop an approach that improves expressiveness, provides partial interpretation, and is not restricted to specific applications.
arXiv Detail & Related papers (2022-10-07T17:56:53Z) - Molecular Attributes Transfer from Non-Parallel Data [57.010952598634944]
We formulate molecular optimization as a style transfer problem and present a novel generative model that could automatically learn internal differences between two groups of non-parallel data.
Experiments on two molecular optimization tasks, toxicity modification and synthesizability improvement, demonstrate that our model significantly outperforms several state-of-the-art methods.
arXiv Detail & Related papers (2021-11-30T06:10:22Z) - Learning Neural Generative Dynamics for Molecular Conformation
Generation [89.03173504444415]
We study how to generate molecule conformations (textiti.e., 3D structures) from a molecular graph.
We propose a novel probabilistic framework to generate valid and diverse conformations given a molecular graph.
arXiv Detail & Related papers (2021-02-20T03:17:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.