Uni-Mol3: A Multi-Molecular Foundation Model for Advancing Organic Reaction Modeling
- URL: http://arxiv.org/abs/2508.00920v1
- Date: Wed, 30 Jul 2025 02:38:52 GMT
- Title: Uni-Mol3: A Multi-Molecular Foundation Model for Advancing Organic Reaction Modeling
- Authors: Lirong Wu, Junjie Wang, Zhifeng Gao, Xiaohong Ji, Rong Zhu, Xinyu Li, Linfeng Zhang, Guolin Ke, Weinan E,
- Abstract summary: This paper introduces Uni-Mol3, a novel deep learning framework that employs a hierarchical pipeline for multi-molecular reaction modeling.<n>At its core, Uni-Mol3 adopts a multi-scale molecular tokenizer (Mol-Tokenizer) that encodes 3D structures of molecules and other features into discrete tokens.<n>With prompt-aware downstream fine-tuning, Uni-Mol3 demonstrates exceptional performance in diverse organic reaction tasks.
- Score: 36.36866930946212
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Organic reaction, the foundation of modern chemical industry, is crucial for new material development and drug discovery. However, deciphering reaction mechanisms and modeling multi-molecular relationships remain formidable challenges due to the complexity of molecular dynamics. While several state-of-the-art models like Uni-Mol2 have revolutionized single-molecular representation learning, their extension to multi-molecular systems, where chemical reactions inherently occur, has been underexplored. This paper introduces Uni-Mol3, a novel deep learning framework that employs a hierarchical pipeline for multi-molecular reaction modeling. At its core, Uni-Mol3 adopts a multi-scale molecular tokenizer (Mol-Tokenizer) that encodes 3D structures of molecules and other features into discrete tokens, creating a 3D-aware molecular language. The framework innovatively combines two pre-training stages: molecular pre-training to learn the molecular grammars and reaction pre-training to capture fundamental reaction principles, forming a progressive learning paradigm from single- to multi-molecular systems. With prompt-aware downstream fine-tuning, Uni-Mol3 demonstrates exceptional performance in diverse organic reaction tasks and supports multi-task prediction with strong generalizability. Experimental results across 10 datasets spanning 4 downstream tasks show that Uni-Mol3 outperforms existing methods, validating its effectiveness in modeling complex organic reactions. This work not only ushers in an alternative paradigm for multi-molecular computational modeling but also charts a course for intelligent organic reaction by bridging molecular representation with reaction mechanism understanding.
Related papers
- Multi-Modal Molecular Representation Learning via Structure Awareness [19.813872931221546]
We propose a structure-awareness-based multi-modal self-supervised molecular representation pre-training framework (MMSA)<n>MMSA enhances molecular graph representations by leveraging invariant knowledge between molecules.<n>It achieves state-of-the-art performance on the MoleculeNet benchmark, with average ROC-AUC improvements ranging from 1.8% to 9.6% over baseline methods.
arXiv Detail & Related papers (2025-05-09T08:37:29Z) - Knowledge-aware contrastive heterogeneous molecular graph learning [77.94721384862699]
We propose a paradigm shift by encoding molecular graphs into Heterogeneous Molecular Graph Learning (KCHML)<n>KCHML conceptualizes molecules through three distinct graph views-molecular, elemental, and pharmacological-enhanced by heterogeneous molecular graphs and a dual message-passing mechanism.<n>This design offers a comprehensive representation for property prediction, as well as for downstream tasks such as drug-drug interaction (DDI) prediction.
arXiv Detail & Related papers (2025-02-17T11:53:58Z) - D3MES: Diffusion Transformer with multihead equivariant self-attention for 3D molecule generation [1.3791394805787949]
We introduce a diffusion model for 3D molecule generation that combines a classifiable diffusion model, Diffusion Transformer, with multihead equiheadvariant self-attention.<n>This method addresses two key challenges: correctly attaching hydrogen atoms in generated molecules through learning representations of molecules after hydrogen atoms are removed; and overcoming the limitations of existing models that cannot generate molecules across multiple classes simultaneously.
arXiv Detail & Related papers (2025-01-13T06:16:11Z) - Learning Chemical Reaction Representation with Reactant-Product Alignment [50.28123475356234]
RAlign is a novel chemical reaction representation learning model for various organic reaction-related tasks.<n>By integrating atomic correspondence between reactants and products, our model discerns the molecular transformations that occur during the reaction.<n>We introduce a reaction-center-aware attention mechanism that enables the model to concentrate on key functional groups.
arXiv Detail & Related papers (2024-11-26T17:41:44Z) - UniIF: Unified Molecule Inverse Folding [67.60267592514381]
We propose a unified model UniIF for inverse folding of all molecules.
Our proposed method surpasses state-of-the-art methods on all tasks.
arXiv Detail & Related papers (2024-05-29T10:26:16Z) - Learning Over Molecular Conformer Ensembles: Datasets and Benchmarks [44.934084652800976]
We introduce the first MoleculAR Conformer Ensemble Learning benchmark to thoroughly evaluate the potential of learning on conformer ensembles.
Our findings reveal that direct learning from an conformer space can improve performance on a variety of tasks and models.
arXiv Detail & Related papers (2023-09-29T20:06:46Z) - Bridging the Gap between Chemical Reaction Pretraining and Conditional
Molecule Generation with a Unified Model [3.3031562864527664]
We propose a unified framework that addresses both the reaction representation learning and molecule generation tasks.
Inspired by the organic chemistry mechanism, we develop a novel pretraining framework that enables us to incorporate inductive biases into the model.
Our framework achieves state-of-the-art results on challenging downstream tasks.
arXiv Detail & Related papers (2023-03-13T10:06:41Z) - Improving Molecular Pretraining with Complementary Featurizations [20.86159731100242]
Molecular pretraining is a paradigm to solve a variety of tasks in computational chemistry and drug discovery.
We show that different featurization techniques convey chemical information differently.
We propose a simple and effective MOlecular pretraining framework with COmplementary featurizations (MOCO)
arXiv Detail & Related papers (2022-09-29T21:11:09Z) - A Molecular Multimodal Foundation Model Associating Molecule Graphs with
Natural Language [63.60376252491507]
We propose a molecular multimodal foundation model which is pretrained from molecular graphs and their semantically related textual data.
We believe that our model would have a broad impact on AI-empowered fields across disciplines such as biology, chemistry, materials, environment, and medicine.
arXiv Detail & Related papers (2022-09-12T00:56:57Z) - Scalable Fragment-Based 3D Molecular Design with Reinforcement Learning [68.8204255655161]
We introduce a novel framework for scalable 3D design that uses a hierarchical agent to build molecules.
In a variety of experiments, we show that our agent, guided only by energy considerations, can efficiently learn to produce molecules with over 100 atoms.
arXiv Detail & Related papers (2022-02-01T18:54:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.