Deep Manifold Transformation for Protein Representation Learning
- URL: http://arxiv.org/abs/2402.09416v1
- Date: Fri, 12 Jan 2024 18:38:14 GMT
- Title: Deep Manifold Transformation for Protein Representation Learning
- Authors: Bozhen Hu, Zelin Zang, Cheng Tan, Stan Z. Li
- Abstract summary: We propose a new underlinedeep underlinemanifold underlinetrans approach for universal underlineprotein underlinerepresentation underlinelformation (DMTPRL)
It employs manifold learning strategies to improve the quality and adaptability of the learned embeddings.
Our proposed DMTPRL method outperforms state-of-the-art baselines on diverse downstream tasks across popular datasets.
- Score: 42.43017670985785
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Protein representation learning is critical in various tasks in biology, such
as drug design and protein structure or function prediction, which has
primarily benefited from protein language models and graph neural networks.
These models can capture intrinsic patterns from protein sequences and
structures through masking and task-related losses. However, the learned
protein representations are usually not well optimized, leading to performance
degradation due to limited data, difficulty adapting to new tasks, etc. To
address this, we propose a new \underline{d}eep \underline{m}anifold
\underline{t}ransformation approach for universal \underline{p}rotein
\underline{r}epresentation \underline{l}earning (DMTPRL). It employs manifold
learning strategies to improve the quality and adaptability of the learned
embeddings. Specifically, we apply a novel manifold learning loss during
training based on the graph inter-node similarity. Our proposed DMTPRL method
outperforms state-of-the-art baselines on diverse downstream tasks across
popular datasets. This validates our approach for learning universal and robust
protein representations. We promise to release the code after acceptance.
Related papers
- NaNa and MiGu: Semantic Data Augmentation Techniques to Enhance Protein Classification in Graph Neural Networks [60.48306899271866]
We propose novel semantic data augmentation methods to incorporate backbone chemical and side-chain biophysical information into protein classification tasks.
Specifically, we leverage molecular biophysical, secondary structure, chemical bonds, andionic features of proteins to facilitate classification tasks.
arXiv Detail & Related papers (2024-03-21T13:27:57Z) - Target-aware Variational Auto-encoders for Ligand Generation with
Multimodal Protein Representation Learning [2.01243755755303]
We introduce TargetVAE, a target-aware auto-encoder that generates with high binding affinities to arbitrary protein targets.
This is the first effort to unify different representations of proteins into a single model that we name as Protein Multimodal Network (PMN)
arXiv Detail & Related papers (2023-08-02T12:08:17Z) - A Systematic Study of Joint Representation Learning on Protein Sequences
and Structures [38.94729758958265]
Learning effective protein representations is critical in a variety of tasks in biology such as predicting protein functions.
Recent sequence representation learning methods based on Protein Language Models (PLMs) excel in sequence-based tasks, but their direct adaptation to tasks involving protein structures remains a challenge.
Our study undertakes a comprehensive exploration of joint protein representation learning by integrating a state-of-the-art PLM with distinct structure encoders.
arXiv Detail & Related papers (2023-03-11T01:24:10Z) - Boosting Convolutional Neural Networks' Protein Binding Site Prediction
Capacity Using SE(3)-invariant transformers, Transfer Learning and
Homology-based Augmentation [1.160208922584163]
Figuring out small binding sites in target proteins, in the resolution of either pocket or residue, is critical in real drugdiscovery scenarios.
Here we present a new computational method for binding site prediction that is relevant to real world applications.
arXiv Detail & Related papers (2023-02-20T05:02:40Z) - Reprogramming Pretrained Language Models for Protein Sequence
Representation Learning [68.75392232599654]
We propose Representation Learning via Dictionary Learning (R2DL), an end-to-end representation learning framework.
R2DL reprograms a pretrained English language model to learn the embeddings of protein sequences.
Our model can attain better accuracy and significantly improve the data efficiency by up to $105$ times over the baselines set by pretrained and standard supervised methods.
arXiv Detail & Related papers (2023-01-05T15:55:18Z) - Slimmable Networks for Contrastive Self-supervised Learning [67.21528544724546]
Self-supervised learning makes significant progress in pre-training large models, but struggles with small models.
We present a one-stage solution to obtain pre-trained small models without the need for extra teachers.
A slimmable network consists of a full network and several weight-sharing sub-networks, which can be pre-trained once to obtain various networks.
arXiv Detail & Related papers (2022-09-30T15:15:05Z) - Structure-aware Protein Self-supervised Learning [50.04673179816619]
We propose a novel structure-aware protein self-supervised learning method to capture structural information of proteins.
In particular, a well-designed graph neural network (GNN) model is pretrained to preserve the protein structural information.
We identify the relation between the sequential information in the protein language model and the structural information in the specially designed GNN model via a novel pseudo bi-level optimization scheme.
arXiv Detail & Related papers (2022-04-06T02:18:41Z) - Multi-Scale Representation Learning on Proteins [78.31410227443102]
This paper introduces a multi-scale graph construction of a protein -- HoloProt.
The surface captures coarser details of the protein, while sequence as primary component and structure captures finer details.
Our graph encoder then learns a multi-scale representation by allowing each level to integrate the encoding from level(s) below with the graph at that level.
arXiv Detail & Related papers (2022-04-04T08:29:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.