PTransIPs: Identification of phosphorylation sites enhanced by protein
PLM embeddings
- URL: http://arxiv.org/abs/2308.05115v3
- Date: Wed, 13 Mar 2024 05:02:32 GMT
- Title: PTransIPs: Identification of phosphorylation sites enhanced by protein
PLM embeddings
- Authors: Ziyang Xu, Haitian Zhong, Bingrui He, Xueying Wang and Tianchi Lu
- Abstract summary: We develop PTransIPs, a new deep learning framework for the identification of phosphorylation sites.
PTransIPs outperforms existing state-of-the-art (SOTA) methods, achieving AUCs of 0.9232 and 0.9660.
- Score: 2.971764950146918
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Phosphorylation is pivotal in numerous fundamental cellular processes and
plays a significant role in the onset and progression of various diseases. The
accurate identification of these phosphorylation sites is crucial for
unraveling the molecular mechanisms within cells and during viral infections,
potentially leading to the discovery of novel therapeutic targets. In this
study, we develop PTransIPs, a new deep learning framework for the
identification of phosphorylation sites. Independent testing results
demonstrate that PTransIPs outperforms existing state-of-the-art (SOTA)
methods, achieving AUCs of 0.9232 and 0.9660 for the identification of
phosphorylated S/T and Y sites, respectively. PTransIPs contributes from three
aspects. 1) PTransIPs is the first to apply protein pre-trained language model
(PLM) embeddings to this task. It utilizes ProtTrans and EMBER2 to extract
sequence and structure embeddings, respectively, as additional inputs into the
model, effectively addressing issues of dataset size and overfitting, thus
enhancing model performance; 2) PTransIPs is based on Transformer architecture,
optimized through the integration of convolutional neural networks and TIM loss
function, providing practical insights for model design and training; 3) The
encoding of amino acids in PTransIPs enables it to serve as a universal
framework for other peptide bioactivity tasks, with its excellent performance
shown in extended experiments of this paper. Our code, data and models are
publicly available at https://github.com/StatXzy7/PTransIPs.
Related papers
- A general language model for peptide identification [4.044600688588866]
PDeepPP is a deep learning framework that integrates pretrained protein language models with parallel transformer-CNN architectures.
The model's hybrid architecture demonstrates unique capabilities in capturing both local sequence motifs and global structural features.
It achieved 218* acceleration over sequence-alignment-based methods while maintaining 99.5% specificity in critical glycosylation site detection.
arXiv Detail & Related papers (2025-02-21T17:31:22Z) - Multi-modal Representation Learning Enables Accurate Protein Function Prediction in Low-Data Setting [0.0]
HOPER (HOlistic ProtEin Representation) is a novel framework designed to enhance protein function prediction (PFP) in low-data settings.
Our results highlight the effectiveness of multimodal representation learning for overcoming data limitations in biological research.
arXiv Detail & Related papers (2024-11-22T20:13:55Z) - MeToken: Uniform Micro-environment Token Boosts Post-Translational Modification Prediction [65.33218256339151]
Post-translational modifications (PTMs) profoundly expand the complexity and functionality of the proteome.
Existing computational approaches predominantly focus on protein sequences to predict PTM sites, driven by the recognition of sequence-dependent motifs.
We introduce the MeToken model, which tokenizes the micro-environment of each acid, integrating both sequence and structural information into unified discrete tokens.
arXiv Detail & Related papers (2024-11-04T07:14:28Z) - DPLM-2: A Multimodal Diffusion Protein Language Model [75.98083311705182]
We introduce DPLM-2, a multimodal protein foundation model that extends discrete diffusion protein language model (DPLM) to accommodate both sequences and structures.
DPLM-2 learns the joint distribution of sequence and structure, as well as their marginals and conditionals.
Empirical evaluation shows that DPLM-2 can simultaneously generate highly compatible amino acid sequences and their corresponding 3D structures.
arXiv Detail & Related papers (2024-10-17T17:20:24Z) - Multi-Peptide: Multimodality Leveraged Language-Graph Learning of Peptide Properties [5.812284760539713]
Multi-Peptide is an innovative approach that combines transformer-based language models with Graph Neural Networks (GNNs) to predict peptide properties.
Evaluations on hemolysis and nonfouling datasets demonstrate Multi-Peptide's robustness, achieving state-of-the-art 86.185% accuracy in hemolysis prediction.
This study highlights the potential of multimodal learning in bioinformatics, paving the way for accurate and reliable predictions in peptide-based research and applications.
arXiv Detail & Related papers (2024-07-02T20:13:47Z) - ASPS: Augmented Segment Anything Model for Polyp Segmentation [77.25557224490075]
The Segment Anything Model (SAM) has introduced unprecedented potential for polyp segmentation.
SAM's Transformer-based structure prioritizes global and low-frequency information.
CFA integrates a trainable CNN encoder branch with a frozen ViT encoder, enabling the integration of domain-specific knowledge.
arXiv Detail & Related papers (2024-06-30T14:55:32Z) - SDR-Former: A Siamese Dual-Resolution Transformer for Liver Lesion
Classification Using 3D Multi-Phase Imaging [59.78761085714715]
This study proposes a novel Siamese Dual-Resolution Transformer (SDR-Former) framework for liver lesion classification.
The proposed framework has been validated through comprehensive experiments on two clinical datasets.
To support the scientific community, we are releasing our extensive multi-phase MR dataset for liver lesion analysis to the public.
arXiv Detail & Related papers (2024-02-27T06:32:56Z) - xTrimoPGLM: Unified 100B-Scale Pre-trained Transformer for Deciphering
the Language of Protein [76.18058946124111]
We propose a unified protein language model, xTrimoPGLM, to address protein understanding and generation tasks simultaneously.
xTrimoPGLM significantly outperforms other advanced baselines in 18 protein understanding benchmarks across four categories.
It can also generate de novo protein sequences following the principles of natural ones, and can perform programmable generation after supervised fine-tuning.
arXiv Detail & Related papers (2024-01-11T15:03:17Z) - Learning to design protein-protein interactions with enhanced generalization [14.983309106361899]
We construct PPIRef, the largest and non-redundant dataset of 3D protein-protein interactions.
We leverage the PPIRef dataset to pre-train PPIformer, a new SE(3)-equivariant model generalizing across diverse protein-binder variants.
We fine-tune PPIformer to predict effects of mutations on protein-protein interactions via a thermodynamically motivated adjustment of the pre-training loss function.
arXiv Detail & Related papers (2023-10-27T22:22:44Z) - PIGNet2: A Versatile Deep Learning-based Protein-Ligand Interaction
Prediction Model for Binding Affinity Scoring and Virtual Screening [0.0]
Prediction of protein-ligand interactions (PLI) plays a crucial role in drug discovery.
The development of a versatile model capable of accurately scoring binding affinity and conducting efficient virtual screening remains a challenge.
Here, we propose a viable solution by introducing a novel data augmentation strategy combined with a physics-informed graph neural network.
arXiv Detail & Related papers (2023-07-03T14:46:49Z) - Multimodal Pre-Training Model for Sequence-based Prediction of
Protein-Protein Interaction [7.022012579173686]
Pre-training a protein model to learn effective representation is critical for protein-protein interactions.
Most pre-training models for PPIs are sequence-based, which naively adopt the language models used in natural language processing to amino acid sequences.
We propose a multimodal protein pre-training model with three modalities: sequence, structure, and function.
arXiv Detail & Related papers (2021-12-09T10:21:52Z) - A Brief Review of Machine Learning Techniques for Protein
Phosphorylation Sites Prediction [0.0]
Reversible Post-Translational Modifications (PTMs) have vital roles in extending the functional diversity of proteins.
PTMs have happened as crucial molecular regulatory mechanisms that are utilized to regulate diverse cellular processes.
Disorder in this modification can be caused by multiple diseases including neurological disorders and cancers.
arXiv Detail & Related papers (2021-08-10T22:23:30Z) - Confidence-guided Lesion Mask-based Simultaneous Synthesis of Anatomic
and Molecular MR Images in Patients with Post-treatment Malignant Gliomas [65.64363834322333]
Confidence Guided SAMR (CG-SAMR) synthesizes data from lesion information to multi-modal anatomic sequences.
module guides the synthesis based on confidence measure about the intermediate results.
experiments on real clinical data demonstrate that the proposed model can perform better than the state-of-theart synthesis methods.
arXiv Detail & Related papers (2020-08-06T20:20:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.