Pre-Training Protein Encoder via Siamese Sequence-Structure Diffusion
Trajectory Prediction
- URL: http://arxiv.org/abs/2301.12068v2
- Date: Sat, 8 Jul 2023 14:46:58 GMT
- Title: Pre-Training Protein Encoder via Siamese Sequence-Structure Diffusion
Trajectory Prediction
- Authors: Zuobai Zhang, Minghao Xu, Aur\'elie Lozano, Vijil Chenthamarakshan,
Payel Das, Jian Tang
- Abstract summary: Self-supervised pre-training methods on proteins have recently gained attention, with most approaches focusing on either protein sequences or structures.
We propose the DiffPreT approach to pre-train a protein encoder by sequence-structure joint diffusion modeling.
We enhance DiffPreT by a method called Siamese Diffusion Trajectory Prediction (SiamDiff) to capture the correlation between different conformers of a protein.
- Score: 29.375830561817047
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Self-supervised pre-training methods on proteins have recently gained
attention, with most approaches focusing on either protein sequences or
structures, neglecting the exploration of their joint distribution, which is
crucial for a comprehensive understanding of protein functions by integrating
co-evolutionary information and structural characteristics. In this work,
inspired by the success of denoising diffusion models in generative tasks, we
propose the DiffPreT approach to pre-train a protein encoder by
sequence-structure joint diffusion modeling. DiffPreT guides the encoder to
recover the native protein sequences and structures from the perturbed ones
along the joint diffusion trajectory, which acquires the joint distribution of
sequences and structures. Considering the essential protein conformational
variations, we enhance DiffPreT by a method called Siamese Diffusion Trajectory
Prediction (SiamDiff) to capture the correlation between different conformers
of a protein. SiamDiff attains this goal by maximizing the mutual information
between representations of diffusion trajectories of structurally-correlated
conformers. We study the effectiveness of DiffPreT and SiamDiff on both atom-
and residue-level structure-based protein understanding tasks. Experimental
results show that the performance of DiffPreT is consistently competitive on
all tasks, and SiamDiff achieves new state-of-the-art performance, considering
the mean ranks on all tasks. Our implementation is available at
https://github.com/DeepGraphLearning/SiamDiff.
Related papers
- ProtFAD: Introducing function-aware domains as implicit modality towards protein function perception [0.3928425951824076]
We propose a function-aware domain representation and a domain-joint contrastive learning strategy to distinguish different protein functions.
Our approach significantly and comprehensively outperforms the state-of-the-art methods on various benchmarks.
arXiv Detail & Related papers (2024-05-24T02:26:45Z) - PSC-CPI: Multi-Scale Protein Sequence-Structure Contrasting for
Efficient and Generalizable Compound-Protein Interaction Prediction [63.50967073653953]
Compound-Protein Interaction prediction aims to predict the pattern and strength of compound-protein interactions for rational drug discovery.
Existing deep learning-based methods utilize only the single modality of protein sequences or structures.
We propose a novel multi-scale Protein Sequence-structure Contrasting framework for CPI prediction.
arXiv Detail & Related papers (2024-02-13T03:51:10Z) - Predicting mutational effects on protein-protein binding via a
side-chain diffusion probabilistic model [14.949807579474781]
We propose SidechainDiff, a representation learning-based approach that leverages unlabelled experimental protein structures.
SidechainDiff is the first diffusion-based generative model for side-chains, distinguishing it from prior efforts that have predominantly focused on generating protein backbone structures.
arXiv Detail & Related papers (2023-10-30T15:23:42Z) - Neural Embeddings for Protein Graphs [0.8258451067861933]
We propose a novel framework for embedding protein graphs in geometric vector spaces.
We learn an encoder function that preserves the structural distance between protein graphs.
Our framework achieves remarkable results in the task of protein structure classification.
arXiv Detail & Related papers (2023-06-07T14:50:34Z) - A Latent Diffusion Model for Protein Structure Generation [50.74232632854264]
We propose a latent diffusion model that can reduce the complexity of protein modeling.
We show that our method can effectively generate novel protein backbone structures with high designability and efficiency.
arXiv Detail & Related papers (2023-05-06T19:10:19Z) - DiffDock-PP: Rigid Protein-Protein Docking with Diffusion Models [47.73386438748902]
DiffDock-PP is a diffusion generative model that learns to translate and rotate unbound protein structures into their bound conformations.
We achieve state-of-the-art performance on DIPS with a median C-RMSD of 4.85, outperforming all considered baselines.
arXiv Detail & Related papers (2023-04-08T02:10:44Z) - A Systematic Study of Joint Representation Learning on Protein Sequences
and Structures [38.94729758958265]
Learning effective protein representations is critical in a variety of tasks in biology such as predicting protein functions.
Recent sequence representation learning methods based on Protein Language Models (PLMs) excel in sequence-based tasks, but their direct adaptation to tasks involving protein structures remains a challenge.
Our study undertakes a comprehensive exploration of joint protein representation learning by integrating a state-of-the-art PLM with distinct structure encoders.
arXiv Detail & Related papers (2023-03-11T01:24:10Z) - Protein Sequence and Structure Co-Design with Equivariant Translation [19.816174223173494]
Existing approaches generate both protein sequence and structure using either autoregressive models or diffusion models.
We propose a new approach capable of protein sequence and structure co-design, which iteratively translates both protein sequence and structure into the desired state.
Our model consists of a trigonometry-aware encoder that reasons geometrical constraints and interactions from context features.
All protein amino acids are updated in one shot in each translation step, which significantly accelerates the inference process.
arXiv Detail & Related papers (2022-10-17T06:00:12Z) - State-specific protein-ligand complex structure prediction with a
multi-scale deep generative model [68.28309982199902]
We present NeuralPLexer, a computational approach that can directly predict protein-ligand complex structures.
Our study suggests that a data-driven approach can capture the structural cooperativity between proteins and small molecules, showing promise in accelerating the design of enzymes, drug molecules, and beyond.
arXiv Detail & Related papers (2022-09-30T01:46:38Z) - Structure-aware Protein Self-supervised Learning [50.04673179816619]
We propose a novel structure-aware protein self-supervised learning method to capture structural information of proteins.
In particular, a well-designed graph neural network (GNN) model is pretrained to preserve the protein structural information.
We identify the relation between the sequential information in the protein language model and the structural information in the specially designed GNN model via a novel pseudo bi-level optimization scheme.
arXiv Detail & Related papers (2022-04-06T02:18:41Z) - EBM-Fold: Fully-Differentiable Protein Folding Powered by Energy-based
Models [53.17320541056843]
We propose a fully-differentiable approach for protein structure optimization, guided by a data-driven generative network.
Our EBM-Fold approach can efficiently produce high-quality decoys, compared against traditional Rosetta-based structure optimization routines.
arXiv Detail & Related papers (2021-05-11T03:40:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.