A Generalist Cross-Domain Molecular Learning Framework for Structure-Based Drug Discovery
- URL: http://arxiv.org/abs/2503.04362v1
- Date: Thu, 06 Mar 2025 12:04:56 GMT
- Title: A Generalist Cross-Domain Molecular Learning Framework for Structure-Based Drug Discovery
- Authors: Yiheng Zhu, Mingyang Li, Junlong Liu, Kun Fu, Jiansheng Wu, Qiuyi Li, Mingze Yin, Jieping Ye, Jian Wu, Zheng Wang,
- Abstract summary: Structure-based drug discovery (SBDD) is a systematic scientific process that develops new drugs by leveraging the detailed physical structure of the target protein.<n>Recent advancements in pre-trained models for biomolecules have demonstrated remarkable success across various biochemical applications.
- Score: 32.573496601865465
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Structure-based drug discovery (SBDD) is a systematic scientific process that develops new drugs by leveraging the detailed physical structure of the target protein. Recent advancements in pre-trained models for biomolecules have demonstrated remarkable success across various biochemical applications, including drug discovery and protein engineering. However, in most approaches, the pre-trained models primarily focus on the characteristics of either small molecules or proteins, without delving into their binding interactions which are essential cross-domain relationships pivotal to SBDD. To fill this gap, we propose a general-purpose foundation model named BIT (an abbreviation for Biomolecular Interaction Transformer), which is capable of encoding a range of biochemical entities, including small molecules, proteins, and protein-ligand complexes, as well as various data formats, encompassing both 2D and 3D structures. Specifically, we introduce Mixture-of-Domain-Experts (MoDE) to handle the biomolecules from diverse biochemical domains and Mixture-of-Structure-Experts (MoSE) to capture positional dependencies in the molecular structures. The proposed mixture-of-experts approach enables BIT to achieve both deep fusion and domain-specific encoding, effectively capturing fine-grained molecular interactions within protein-ligand complexes. Then, we perform cross-domain pre-training on the shared Transformer backbone via several unified self-supervised denoising tasks. Experimental results on various benchmarks demonstrate that BIT achieves exceptional performance in downstream tasks, including binding affinity prediction, structure-based virtual screening, and molecular property prediction.
Related papers
- Concept-Driven Deep Learning for Enhanced Protein-Specific Molecular Generation [28.09898110053281]
We propose a novel fragment-based molecular generation framework tailored for specific proteins.
Our approach significantly improves synthetic feasibility and binding affinity, with a 4% increase in drug-likeness and a 6% improvement in synthetic feasibility.
arXiv Detail & Related papers (2025-03-11T08:21:57Z) - Molecule Generation for Target Protein Binding with Hierarchical Consistency Diffusion Model [17.885767456439215]
Atom-Motif Consistency Diffusion Model (AMDiff) is a hierarchical diffusion architecture that integrates both atom- and motif-level views of molecules.<n>Compared to existing approaches, AMDiff exhibits superior validity and novelty in generating molecules tailored to fit various protein pockets.
arXiv Detail & Related papers (2025-03-02T17:54:30Z) - Life-Code: Central Dogma Modeling with Multi-Omics Sequence Unification [53.488387420073536]
Life-Code is a comprehensive framework that spans different biological functions.<n>Life-Code achieves state-of-the-art performance on various tasks across three omics.
arXiv Detail & Related papers (2025-02-11T06:53:59Z) - Large-Scale Multi-omic Biosequence Transformers for Modeling Protein-Nucleic Acid Interactions [4.36852565205713]
We present our work training the largest open-source multi-omic foundation model to date.
We show that these multi-omic models can learn joint representations between various single-omic distributions.
We also demonstrate that MOMs can be fine-tuned to achieve state-of-the-art results on protein-nucleic acid interaction tasks.
arXiv Detail & Related papers (2024-08-29T03:56:40Z) - UniIF: Unified Molecule Inverse Folding [67.60267592514381]
We propose a unified model UniIF for inverse folding of all molecules.
Our proposed method surpasses state-of-the-art methods on all tasks.
arXiv Detail & Related papers (2024-05-29T10:26:16Z) - Functional-Group-Based Diffusion for Pocket-Specific Molecule Generation and Elaboration [63.23362798102195]
We propose D3FG, a functional-group-based diffusion model for pocket-specific molecule generation and elaboration.
D3FG decomposes molecules into two categories of components: functional groups defined as rigid bodies and linkers as mass points.
In the experiments, our method can generate molecules with more realistic 3D structures, competitive affinities toward the protein targets, and better drug properties.
arXiv Detail & Related papers (2023-05-30T06:41:20Z) - State-specific protein-ligand complex structure prediction with a
multi-scale deep generative model [68.28309982199902]
We present NeuralPLexer, a computational approach that can directly predict protein-ligand complex structures.
Our study suggests that a data-driven approach can capture the structural cooperativity between proteins and small molecules, showing promise in accelerating the design of enzymes, drug molecules, and beyond.
arXiv Detail & Related papers (2022-09-30T01:46:38Z) - Widely Used and Fast De Novo Drug Design by a Protein Sequence-Based
Reinforcement Learning Model [4.815696666006742]
Structure-based de novo method can overcome the data scarcity of active by incorporating drug-target interaction into deep generative architectures.
Here, we demonstrate a widely used and fast protein sequence-based reinforcement learning model for drug discovery.
As a proof of concept, the RL model was utilized to design molecules for four targets.
arXiv Detail & Related papers (2022-08-14T10:41:52Z) - Accurate Machine Learned Quantum-Mechanical Force Fields for
Biomolecular Simulations [51.68332623405432]
Molecular dynamics (MD) simulations allow atomistic insights into chemical and biological processes.
Recently, machine learned force fields (MLFFs) emerged as an alternative means to execute MD simulations.
This work proposes a general approach to constructing accurate MLFFs for large-scale molecular simulations.
arXiv Detail & Related papers (2022-05-17T13:08:28Z) - A silicon qubit platform for in situ single molecule structure
determination [0.7187911114620571]
Imaging individual conformational instances of generic, inhomogeneous, transient or intrinsically disordered protein systems at the single molecule level in situ is one of the notable challenges in structural biology.
Here we tackle the problem by designing a single molecule imaging platform technology embracing the advantages silicon-based spin qubits.
We demonstrate through detailed simulation, that this platform enables scalable atomic-level structure-determination of individual molecular systems in native environments.
arXiv Detail & Related papers (2021-12-07T10:42:09Z) - CogMol: Target-Specific and Selective Drug Design for COVID-19 Using
Deep Generative Models [74.58583689523999]
We propose an end-to-end framework, named CogMol, for designing new drug-like small molecules targeting novel viral proteins.
CogMol combines adaptive pre-training of a molecular SMILES Variational Autoencoder (VAE) and an efficient multi-attribute controlled sampling scheme.
CogMol handles multi-constraint design of synthesizable, low-toxic, drug-like molecules with high target specificity and selectivity.
arXiv Detail & Related papers (2020-04-02T18:17:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.