PPDiff: Diffusing in Hybrid Sequence-Structure Space for Protein-Protein Complex Design
- URL: http://arxiv.org/abs/2506.11420v2
- Date: Fri, 31 Oct 2025 00:03:51 GMT
- Title: PPDiff: Diffusing in Hybrid Sequence-Structure Space for Protein-Protein Complex Design
- Authors: Zhenqiao Song, Tiaoxiao Li, Lei Li, Martin Renqiang Min,
- Abstract summary: PPDiff is a diffusion model to jointly design the sequence and structure of binders for arbitrary protein targets.<n> PPDiff builds upon our developed Sequence Structure Inter Network with Causal attention layers.<n>The model is pretrained on PPBench and finetuned on two real-world applications.
- Score: 20.033392739225658
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Designing protein-binding proteins with high affinity is critical in biomedical research and biotechnology. Despite recent advancements targeting specific proteins, the ability to create high-affinity binders for arbitrary protein targets on demand, without extensive rounds of wet-lab testing, remains a significant challenge. Here, we introduce PPDiff, a diffusion model to jointly design the sequence and structure of binders for arbitrary protein targets in a non-autoregressive manner. PPDiffbuilds upon our developed Sequence Structure Interleaving Network with Causal attention layers (SSINC), which integrates interleaved self-attention layers to capture global amino acid correlations, k-nearest neighbor (kNN) equivariant graph layers to model local interactions in three-dimensional (3D) space, and causal attention layers to simplify the intricate interdependencies within the protein sequence. To assess PPDiff, we curate PPBench, a general protein-protein complex dataset comprising 706,360 complexes from the Protein Data Bank (PDB). The model is pretrained on PPBenchand finetuned on two real-world applications: target-protein mini-binder complex design and antigen-antibody complex design. PPDiffconsistently surpasses baseline methods, achieving success rates of 50.00%, 23.16%, and 16.89% for the pretraining task and the two downstream applications, respectively. The code, data and models are available at https://github.com/JocelynSong/PPDiff.
Related papers
- S$^2$Drug: Bridging Protein Sequence and 3D Structure in Contrastive Representation Learning for Virtual Screening [72.89086338778098]
We propose a two-stage framework for protein-ligand contrastive representation learning.<n>In the first stage, we perform protein sequence pretraining on ChemBL using an ESM2-based backbone.<n>In the second stage, we fine-tune on PDBBind by fusing sequence and structure information through a residue-level gating module.<n>This auxiliary task guides the model to accurately localize binding residues within the protein sequence and capture their 3D spatial arrangement.
arXiv Detail & Related papers (2025-11-10T11:57:47Z) - ProteinAE: Protein Diffusion Autoencoders for Structure Encoding [64.77182442408254]
We introduce ProteinAE, a novel and streamlined protein diffusion autoencoder.<n>ProteinAE directly maps protein backbone coordinates from E(3) into a continuous, compact latent space.<n>We demonstrate that ProteinAE achieves state-of-the-art reconstruction quality, outperforming existing autoencoders.
arXiv Detail & Related papers (2025-10-12T14:30:32Z) - PRING: Rethinking Protein-Protein Interaction Prediction from Pairs to Graphs [80.08310253195144]
PRING is the first benchmark that evaluates protein-protein interaction prediction from a graph-level perspective.<n> PRING curates a high-quality, multi-species PPI network dataset comprising 21,484 proteins and 186,818 interactions.
arXiv Detail & Related papers (2025-07-07T15:21:05Z) - Beyond Simple Concatenation: Fairly Assessing PLM Architectures for Multi-Chain Protein-Protein Interactions Prediction [0.2509487459755192]
Protein-protein interactions (PPIs) are fundamental to numerous cellular processes.<n>PLMs have demonstrated remarkable success in predicting protein structure and function.<n>Their application to sequence-based PPI binding affinity prediction remains relatively underexplored.
arXiv Detail & Related papers (2025-05-26T14:23:08Z) - An All-Atom Generative Model for Designing Protein Complexes [65.06317264153175]
APM (All-Atom Protein Generative Model) is a model specifically designed for modeling multi-chain proteins.<n>It is capable of precisely modeling inter-chain interactions and designing protein complexes with binding capabilities from scratch.<n>It also performs folding and inverse-folding tasks for multi-chain proteins.
arXiv Detail & Related papers (2025-04-17T16:37:41Z) - ProteinWeaver: A Divide-and-Assembly Approach for Protein Backbone Design [61.19456204667385]
We introduce ProteinWeaver, a two-stage framework for protein backbone design.<n>ProteinWeaver generates high-quality, novel protein backbones through versatile domain assembly.<n>By introducing a divide-and-assembly' paradigm, ProteinWeaver advances protein engineering and opens new avenues for functional protein design.
arXiv Detail & Related papers (2024-11-08T08:10:49Z) - SFM-Protein: Integrative Co-evolutionary Pre-training for Advanced Protein Sequence Representation [97.99658944212675]
We introduce a novel pre-training strategy for protein foundation models.
It emphasizes the interactions among amino acid residues to enhance the extraction of both short-range and long-range co-evolutionary features.
Trained on a large-scale protein sequence dataset, our model demonstrates superior generalization ability.
arXiv Detail & Related papers (2024-10-31T15:22:03Z) - PSC-CPI: Multi-Scale Protein Sequence-Structure Contrasting for
Efficient and Generalizable Compound-Protein Interaction Prediction [63.50967073653953]
Compound-Protein Interaction prediction aims to predict the pattern and strength of compound-protein interactions for rational drug discovery.
Existing deep learning-based methods utilize only the single modality of protein sequences or structures.
We propose a novel multi-scale Protein Sequence-structure Contrasting framework for CPI prediction.
arXiv Detail & Related papers (2024-02-13T03:51:10Z) - Effective Protein-Protein Interaction Exploration with PPIretrieval [46.07027715907749]
We propose PPIretrieval, the first deep learning-based model for protein-protein interaction exploration.
PPIretrieval searches for potential PPIs in an embedding space, capturing rich geometric and chemical information of protein surfaces.
arXiv Detail & Related papers (2024-02-06T03:57:06Z) - A Hierarchical Training Paradigm for Antibody Structure-sequence
Co-design [54.30457372514873]
We propose a hierarchical training paradigm (HTP) for the antibody sequence-structure co-design.
HTP consists of four levels of training stages, each corresponding to a specific protein modality.
Empirical experiments show that HTP sets the new state-of-the-art performance in the co-design problem.
arXiv Detail & Related papers (2023-10-30T02:39:15Z) - Functional Geometry Guided Protein Sequence and Backbone Structure
Co-Design [12.585697288315846]
We propose a model to jointly design Protein sequence and structure based on automatically detected functional sites.
NAEPro is powered by an interleaving network of attention and equivariant layers, which can capture global correlation in a whole sequence.
Experimental results show that our model consistently achieves the highest amino acid recovery rate, TM-score, and the lowest RMSD among all competitors.
arXiv Detail & Related papers (2023-10-06T16:08:41Z) - Joint Design of Protein Sequence and Structure based on Motifs [11.731131799546489]
We propose GeoPro, a method to design protein backbone structure and sequence jointly.
GeoPro is powered by an equivariant encoder for three-dimensional (3D) backbone structure and a protein sequence decoder guided by 3D geometry.
Our method discovers novel $beta$-lactamases and myoglobins which are not present in protein data bank (PDB) and UniProt.
arXiv Detail & Related papers (2023-10-04T03:07:03Z) - Target-aware Variational Auto-encoders for Ligand Generation with
Multimodal Protein Representation Learning [2.01243755755303]
We introduce TargetVAE, a target-aware auto-encoder that generates with high binding affinities to arbitrary protein targets.
This is the first effort to unify different representations of proteins into a single model that we name as Protein Multimodal Network (PMN)
arXiv Detail & Related papers (2023-08-02T12:08:17Z) - State-specific protein-ligand complex structure prediction with a
multi-scale deep generative model [68.28309982199902]
We present NeuralPLexer, a computational approach that can directly predict protein-ligand complex structures.
Our study suggests that a data-driven approach can capture the structural cooperativity between proteins and small molecules, showing promise in accelerating the design of enzymes, drug molecules, and beyond.
arXiv Detail & Related papers (2022-09-30T01:46:38Z) - DIPS-Plus: The Enhanced Database of Interacting Protein Structures for
Interface Prediction [2.697420611471228]
We present DIPS-Plus, an enhanced, feature-rich dataset of 42,112 complexes for geometric deep learning of protein interfaces.
The previous version of DIPS contains only the Cartesian coordinates and types of the atoms comprising a given protein complex.
DIPS-Plus now includes a plethora of new residue-level features including protrusion indices, half-sphere amino acid compositions, and new profile hidden Markov model (HMM)-based sequence features for each amino acid.
arXiv Detail & Related papers (2021-06-06T23:56:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.