MolChord: Structure-Sequence Alignment for Protein-Guided Drug Design
- URL: http://arxiv.org/abs/2510.27671v1
- Date: Fri, 31 Oct 2025 17:35:53 GMT
- Title: MolChord: Structure-Sequence Alignment for Protein-Guided Drug Design
- Authors: Wei Zhang, Zekun Guo, Yingce Xia, Peiran Jin, Shufang Xie, Tao Qin, Xiang-Yang Li,
- Abstract summary: MolChord aims to align protein and molecule structures with their textual descriptions and sequential representations.<n>We leverage autoregressive model unifying text, small molecules, and proteins, as the molecule generator, alongside a diffusion-based structure encoder.<n>We curate a property-aware dataset by integrating preference data and refine the alignment process using Direct Preference Optimization.
- Score: 25.550555350063366
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Structure-based drug design (SBDD), which maps target proteins to candidate molecular ligands, is a fundamental task in drug discovery. Effectively aligning protein structural representations with molecular representations, and ensuring alignment between generated drugs and their pharmacological properties, remains a critical challenge. To address these challenges, we propose MolChord, which integrates two key techniques: (1) to align protein and molecule structures with their textual descriptions and sequential representations (e.g., FASTA for proteins and SMILES for molecules), we leverage NatureLM, an autoregressive model unifying text, small molecules, and proteins, as the molecule generator, alongside a diffusion-based structure encoder; and (2) to guide molecules toward desired properties, we curate a property-aware dataset by integrating preference data and refine the alignment process using Direct Preference Optimization (DPO). Experimental results on CrossDocked2020 demonstrate that our approach achieves state-of-the-art performance on key evaluation metrics, highlighting its potential as a practical tool for SBDD.
Related papers
- Representing local protein environments with atomistic foundation models [6.120694232253299]
We propose a novel representation for a local protein environment derived from the intermediate features of atomistic foundation models (AFMs)<n>We show that the AFM-derived representation space exhibits meaningful structure, enabling the construction of data-driven priors.<n>In the context of biomolecular NMR spectroscopy, we demonstrate that the proposed representations enable a first-of-its-kind physics-informed chemical shift predictor.
arXiv Detail & Related papers (2025-05-29T11:25:47Z) - A Generalist Cross-Domain Molecular Learning Framework for Structure-Based Drug Discovery [32.573496601865465]
Structure-based drug discovery (SBDD) is a systematic scientific process that develops new drugs by leveraging the detailed physical structure of the target protein.<n>Recent advancements in pre-trained models for biomolecules have demonstrated remarkable success across various biochemical applications.
arXiv Detail & Related papers (2025-03-06T12:04:56Z) - Molecule Generation for Target Protein Binding with Hierarchical Consistency Diffusion Model [17.885767456439215]
Atom-Motif Consistency Diffusion Model (AMDiff) is a hierarchical diffusion architecture that integrates both atom- and motif-level views of molecules.<n>Compared to existing approaches, AMDiff exhibits superior validity and novelty in generating molecules tailored to fit various protein pockets.
arXiv Detail & Related papers (2025-03-02T17:54:30Z) - AUTODIFF: Autoregressive Diffusion Modeling for Structure-based Drug Design [16.946648071157618]
We propose a diffusion-based fragment-wise autoregressive generation model for structure-based drug design (SBDD)
We design a novel molecule assembly strategy named conformal motif that preserves the conformation of local structures of molecules first.
We then encode the interaction of the protein-ligand complex with an SE(3)-equivariant convolutional network and generate molecules motif-by-motif with diffusion modeling.
arXiv Detail & Related papers (2024-04-02T14:44:02Z) - DecompDiff: Diffusion Models with Decomposed Priors for Structure-Based Drug Design [62.68420322996345]
Existing structured-based drug design methods treat all ligand atoms equally.
We propose a new diffusion model, DecompDiff, with decomposed priors over arms and scaffold.
Our approach achieves state-of-the-art performance in generating high-affinity molecules.
arXiv Detail & Related papers (2024-02-26T05:21:21Z) - Unveiling Molecular Moieties through Hierarchical Grad-CAM Graph Explainability [0.0]
The integration of explainable methods to elucidate the specific contributions of molecular substructures to biological activity remains a significant challenge.<n>We trained 20 GNN models on a dataset of small molecules with the goal of predicting their activity on 20 distinct protein targets from the Kinase family.<n>We implemented the Hierarchical Grad-CAM graph Explainer framework, enabling an in-depth analysis of the molecular moieties driving protein-ligand binding stabilization.
arXiv Detail & Related papers (2024-01-29T17:23:25Z) - Structure-informed Language Models Are Protein Designers [69.70134899296912]
We present LM-Design, a generic approach to reprogramming sequence-based protein language models (pLMs)
We conduct a structural surgery on pLMs, where a lightweight structural adapter is implanted into pLMs and endows it with structural awareness.
Experiments show that our approach outperforms the state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2023-02-03T10:49:52Z) - State-specific protein-ligand complex structure prediction with a
multi-scale deep generative model [68.28309982199902]
We present NeuralPLexer, a computational approach that can directly predict protein-ligand complex structures.
Our study suggests that a data-driven approach can capture the structural cooperativity between proteins and small molecules, showing promise in accelerating the design of enzymes, drug molecules, and beyond.
arXiv Detail & Related papers (2022-09-30T01:46:38Z) - Widely Used and Fast De Novo Drug Design by a Protein Sequence-Based
Reinforcement Learning Model [4.815696666006742]
Structure-based de novo method can overcome the data scarcity of active by incorporating drug-target interaction into deep generative architectures.
Here, we demonstrate a widely used and fast protein sequence-based reinforcement learning model for drug discovery.
As a proof of concept, the RL model was utilized to design molecules for four targets.
arXiv Detail & Related papers (2022-08-14T10:41:52Z) - Learning Geometrically Disentangled Representations of Protein Folding
Simulations [72.03095377508856]
This work focuses on learning a generative neural network on a structural ensemble of a drug-target protein.
Model tasks involve characterizing the distinct structural fluctuations of the protein bound to various drug molecules.
Results show that our geometric learning-based method enjoys both accuracy and efficiency for generating complex structural variations.
arXiv Detail & Related papers (2022-05-20T19:38:00Z) - Transfer Learning for Protein Structure Classification at Low Resolution [124.5573289131546]
We show that it is possible to make accurate ($geq$80%) predictions of protein class and architecture from structures determined at low ($leq$3A) resolution.
We provide proof of concept for high-speed, low-cost protein structure classification at low resolution, and a basis for extension to prediction of function.
arXiv Detail & Related papers (2020-08-11T15:01:32Z) - CogMol: Target-Specific and Selective Drug Design for COVID-19 Using
Deep Generative Models [74.58583689523999]
We propose an end-to-end framework, named CogMol, for designing new drug-like small molecules targeting novel viral proteins.
CogMol combines adaptive pre-training of a molecular SMILES Variational Autoencoder (VAE) and an efficient multi-attribute controlled sampling scheme.
CogMol handles multi-constraint design of synthesizable, low-toxic, drug-like molecules with high target specificity and selectivity.
arXiv Detail & Related papers (2020-04-02T18:17:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.