PharMolixFM: All-Atom Foundation Models for Molecular Modeling and Generation
- URL: http://arxiv.org/abs/2503.21788v3
- Date: Tue, 01 Apr 2025 02:12:44 GMT
- Title: PharMolixFM: All-Atom Foundation Models for Molecular Modeling and Generation
- Authors: Yizhen Luo, Jiashuo Wang, Siqi Fan, Zaiqing Nie,
- Abstract summary: We propose PharMolixFM, a unified framework for constructing all-atom foundation models.<n>Our framework includes three variants using state-of-the-art multi-modal generative models.<n>PharMolixFM-Diff achieves competitive prediction accuracy in protein-small-molecule docking.
- Score: 4.402280157389038
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Structural biology relies on accurate three-dimensional biomolecular structures to advance our understanding of biological functions, disease mechanisms, and therapeutics. While recent advances in deep learning have enabled the development of all-atom foundation models for molecular modeling and generation, existing approaches face challenges in generalization due to the multi-modal nature of atomic data and the lack of comprehensive analysis of training and sampling strategies. To address these limitations, we propose PharMolixFM, a unified framework for constructing all-atom foundation models based on multi-modal generative techniques. Our framework includes three variants using state-of-the-art multi-modal generative models. By formulating molecular tasks as a generalized denoising process with task-specific priors, PharMolixFM achieves robust performance across various structural biology applications. Experimental results demonstrate that PharMolixFM-Diff achieves competitive prediction accuracy in protein-small-molecule docking (83.9% vs. 90.2% RMSD < 2{\AA}, given pocket) with significantly improved inference speed. Moreover, we explore the empirical inference scaling law by introducing more sampling repeats or steps. Our code and model are available at https://github.com/PharMolix/OpenBioMed.
Related papers
- UniGenX: Unified Generation of Sequence and Structure with Autoregressive Diffusion [61.690978792873196]
Existing approaches rely on either autoregressive sequence models or diffusion models.<n>We propose UniGenX, a unified framework that combines autoregressive next-token prediction with conditional diffusion models.<n>We validate the effectiveness of UniGenX on material and small molecule generation tasks.
arXiv Detail & Related papers (2025-03-09T16:43:07Z) - Diffusion Models for Molecules: A Survey of Methods and Tasks [56.44565051667812]
Generative tasks about molecules are crucial for drug discovery and material design.<n>Diffusion models have emerged as an impressive class of deep generative models.<n>This paper conducts a comprehensive survey of diffusion model-based molecular generative methods.
arXiv Detail & Related papers (2025-02-13T17:22:50Z) - NeuralPLexer3: Accurate Biomolecular Complex Structure Prediction with Flow Models [6.75152379258166]
We present NeuralPLexer3, a flow-based generative model that achieves state-of-the-art prediction accuracy on key biomolecular interaction types.
Examined through newly developed benchmarking strategies, NeuralPLexer3 excels in vital areas that are crucial to structure-based drug design.
arXiv Detail & Related papers (2024-12-14T08:28:45Z) - CryoFM: A Flow-based Foundation Model for Cryo-EM Densities [50.291974465864364]
We present CryoFM, a foundation model designed as a generative model, learning the distribution of high-quality density maps.<n>Built on flow matching, CryoFM is trained to accurately capture the prior distribution of biomolecular density maps.
arXiv Detail & Related papers (2024-10-11T08:53:58Z) - UniIF: Unified Molecule Inverse Folding [67.60267592514381]
We propose a unified model UniIF for inverse folding of all molecules.
Our proposed method surpasses state-of-the-art methods on all tasks.
arXiv Detail & Related papers (2024-05-29T10:26:16Z) - Integrating Chemical Language and Molecular Graph in Multimodal Fused Deep Learning for Drug Property Prediction [9.388979080270103]
We construct multimodal deep learning models to cover different molecular representations.
Compared with mono-modal models, our multimodal fused deep learning (MMFDL) models outperform single models in accuracy, reliability, and resistance capability against noise.
arXiv Detail & Related papers (2023-12-29T07:19:42Z) - MolFM: A Multimodal Molecular Foundation Model [9.934141536012596]
MolFM is a multimodal molecular foundation model designed to facilitate joint representation learning from molecular structures, biomedical texts, and knowledge graphs.
We provide theoretical analysis that our cross-modal pre-training captures local and global molecular knowledge by minimizing the distance in the feature space between different modalities of the same molecule.
On cross-modal retrieval, MolFM outperforms existing models with 12.13% and 5.04% absolute gains under the zero-shot and fine-tuning settings, respectively.
arXiv Detail & Related papers (2023-06-06T12:45:15Z) - Modeling Molecular Structures with Intrinsic Diffusion Models [2.487445341407889]
This thesis proposes Intrinsic Diffusion Modeling.
It combines diffusion generative models with scientific knowledge about the flexibility of biological complexes.
We demonstrate the effectiveness of this approach on two fundamental tasks at the basis of computational chemistry and biology.
arXiv Detail & Related papers (2023-02-23T03:26:48Z) - Bidirectional Generation of Structure and Properties Through a Single
Molecular Foundation Model [44.60174246341653]
We present a novel multimodal molecular pre-trained model that incorporates the modalities of structure and biochemical properties.
Our proposed model pipeline of data handling and training objectives aligns the structure/property features in a common embedding space.
These contributions emerge synergistic knowledge, allowing us to tackle both multimodal and unimodal downstream tasks through a single model.
arXiv Detail & Related papers (2022-11-19T05:16:08Z) - A Molecular Multimodal Foundation Model Associating Molecule Graphs with
Natural Language [63.60376252491507]
We propose a molecular multimodal foundation model which is pretrained from molecular graphs and their semantically related textual data.
We believe that our model would have a broad impact on AI-empowered fields across disciplines such as biology, chemistry, materials, environment, and medicine.
arXiv Detail & Related papers (2022-09-12T00:56:57Z) - Learning Neural Generative Dynamics for Molecular Conformation
Generation [89.03173504444415]
We study how to generate molecule conformations (textiti.e., 3D structures) from a molecular graph.
We propose a novel probabilistic framework to generate valid and diverse conformations given a molecular graph.
arXiv Detail & Related papers (2021-02-20T03:17:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.