GPCR-BERT: Interpreting Sequential Design of G Protein Coupled Receptors
Using Protein Language Models
- URL: http://arxiv.org/abs/2310.19915v1
- Date: Mon, 30 Oct 2023 18:28:50 GMT
- Title: GPCR-BERT: Interpreting Sequential Design of G Protein Coupled Receptors
Using Protein Language Models
- Authors: Seongwon Kim, Parisa Mollaei, Akshay Antony, Rishikesh Magar, Amir
Barati Farimani
- Abstract summary: We developed the GPCR-BERT model for understanding the sequential design of G Protein-Coupled Receptors (GPCRs)
GPCRs are the target of over one-third of FDA-approved pharmaceuticals.
We were able to shed light on several relationships between residues in the binding pocket and some of the conserved motifs.
- Score: 5.812284760539713
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: With the rise of Transformers and Large Language Models (LLMs) in Chemistry
and Biology, new avenues for the design and understanding of therapeutics have
opened up to the scientific community. Protein sequences can be modeled as
language and can take advantage of recent advances in LLMs, specifically with
the abundance of our access to the protein sequence datasets. In this paper, we
developed the GPCR-BERT model for understanding the sequential design of G
Protein-Coupled Receptors (GPCRs). GPCRs are the target of over one-third of
FDA-approved pharmaceuticals. However, there is a lack of comprehensive
understanding regarding the relationship between amino acid sequence, ligand
selectivity, and conformational motifs (such as NPxxY, CWxP, E/DRY). By
utilizing the pre-trained protein model (Prot-Bert) and fine-tuning with
prediction tasks of variations in the motifs, we were able to shed light on
several relationships between residues in the binding pocket and some of the
conserved motifs. To achieve this, we took advantage of attention weights, and
hidden states of the model that are interpreted to extract the extent of
contributions of amino acids in dictating the type of masked ones. The
fine-tuned models demonstrated high accuracy in predicting hidden residues
within the motifs. In addition, the analysis of embedding was performed over 3D
structures to elucidate the higher-order interactions within the conformations
of the receptors.
Related papers
- PRING: Rethinking Protein-Protein Interaction Prediction from Pairs to Graphs [80.08310253195144]
PRING is the first benchmark that evaluates protein-protein interaction prediction from a graph-level perspective.<n> PRING curates a high-quality, multi-species PPI network dataset comprising 21,484 proteins and 186,818 interactions.
arXiv Detail & Related papers (2025-07-07T15:21:05Z) - Enhancing TCR-Peptide Interaction Prediction with Pretrained Language Models and Molecular Representations [0.39945675027960637]
We present LANTERN, a deep learning framework that combines large-scale protein language models with chemical representations of peptides.<n>Our model demonstrates superior performance, particularly in zero-shot and few-shot learning scenarios.<n>These results highlight the potential of LANTERN to advance TCR-pMHC binding prediction and support the development of personalized immunotherapies.
arXiv Detail & Related papers (2025-04-22T20:22:34Z) - SE(3)-Equivariant Ternary Complex Prediction Towards Target Protein Degradation [28.648225112411637]
Targeted protein degradation (TPD) induced by small molecules has emerged as a rapidly evolving modality in drug discovery.
DeepTernary is a novel deep learning-based approach that directly predicts ternary structures in an end-to-end manner.
arXiv Detail & Related papers (2025-02-26T06:33:24Z) - Life-Code: Central Dogma Modeling with Multi-Omics Sequence Unification [53.488387420073536]
Life-Code is a comprehensive framework that spans different biological functions.
Life-Code achieves state-of-the-art performance on various tasks across three omics.
arXiv Detail & Related papers (2025-02-11T06:53:59Z) - GENERator: A Long-Context Generative Genomic Foundation Model [66.46537421135996]
We present GENERator, a generative genomic foundation model featuring a context length of 98k base pairs (bp) and 1.2B parameters.
Trained on an expansive dataset comprising 386B bp of DNA, the GENERator demonstrates state-of-the-art performance across both established and newly proposed benchmarks.
It also shows significant promise in sequence optimization, particularly through the prompt-responsive generation of enhancer sequences with specific activity profiles.
arXiv Detail & Related papers (2025-02-11T05:39:49Z) - SFM-Protein: Integrative Co-evolutionary Pre-training for Advanced Protein Sequence Representation [97.99658944212675]
We introduce a novel pre-training strategy for protein foundation models.
It emphasizes the interactions among amino acid residues to enhance the extraction of both short-range and long-range co-evolutionary features.
Trained on a large-scale protein sequence dataset, our model demonstrates superior generalization ability.
arXiv Detail & Related papers (2024-10-31T15:22:03Z) - CoPRA: Bridging Cross-domain Pretrained Sequence Models with Complex Structures for Protein-RNA Binding Affinity Prediction [23.1499716310298]
We build the largest protein-RNA binding affinity dataset PRA310 for performance evaluation.
We provide extensive analyses and verify that CoPRA can (1) accurately predict the protein-RNA binding affinity; (2) understand the binding affinity change caused by mutations; and (3) benefit from scaling data and model size.
arXiv Detail & Related papers (2024-08-21T09:48:22Z) - Protein binding affinity prediction under multiple substitutions applying eGNNs on Residue and Atomic graphs combined with Language model information: eGRAL [1.840390797252648]
Deep learning is increasingly recognized as a powerful tool capable of bridging the gap between in-silico predictions and in-vitro observations.
We propose eGRAL, a novel graph neural network architecture designed for predicting binding affinity changes from amino acid substitutions in protein complexes.
eGRAL leverages residue, atomic and evolutionary scales, thanks to features extracted from protein large language models.
arXiv Detail & Related papers (2024-05-03T10:33:19Z) - ProLLM: Protein Chain-of-Thoughts Enhanced LLM for Protein-Protein Interaction Prediction [54.132290875513405]
The prediction of protein-protein interactions (PPIs) is crucial for understanding biological functions and diseases.
Previous machine learning approaches to PPI prediction mainly focus on direct physical interactions.
We propose a novel framework ProLLM that employs an LLM tailored for PPI for the first time.
arXiv Detail & Related papers (2024-03-30T05:32:42Z) - xTrimoPGLM: Unified 100B-Scale Pre-trained Transformer for Deciphering
the Language of Protein [76.18058946124111]
We propose a unified protein language model, xTrimoPGLM, to address protein understanding and generation tasks simultaneously.
xTrimoPGLM significantly outperforms other advanced baselines in 18 protein understanding benchmarks across four categories.
It can also generate de novo protein sequences following the principles of natural ones, and can perform programmable generation after supervised fine-tuning.
arXiv Detail & Related papers (2024-01-11T15:03:17Z) - A Hierarchical Training Paradigm for Antibody Structure-sequence
Co-design [54.30457372514873]
We propose a hierarchical training paradigm (HTP) for the antibody sequence-structure co-design.
HTP consists of four levels of training stages, each corresponding to a specific protein modality.
Empirical experiments show that HTP sets the new state-of-the-art performance in the co-design problem.
arXiv Detail & Related papers (2023-10-30T02:39:15Z) - Functional Geometry Guided Protein Sequence and Backbone Structure
Co-Design [12.585697288315846]
We propose a model to jointly design Protein sequence and structure based on automatically detected functional sites.
NAEPro is powered by an interleaving network of attention and equivariant layers, which can capture global correlation in a whole sequence.
Experimental results show that our model consistently achieves the highest amino acid recovery rate, TM-score, and the lowest RMSD among all competitors.
arXiv Detail & Related papers (2023-10-06T16:08:41Z) - Unsupervised ensemble-based phenotyping helps enhance the
discoverability of genes related to heart morphology [57.25098075813054]
We propose a new framework for gene discovery entitled Un Phenotype Ensembles.
It builds a redundant yet highly expressive representation by pooling a set of phenotypes learned in an unsupervised manner.
These phenotypes are then analyzed via (GWAS), retaining only highly confident and stable associations.
arXiv Detail & Related papers (2023-01-07T18:36:44Z) - State-specific protein-ligand complex structure prediction with a
multi-scale deep generative model [68.28309982199902]
We present NeuralPLexer, a computational approach that can directly predict protein-ligand complex structures.
Our study suggests that a data-driven approach can capture the structural cooperativity between proteins and small molecules, showing promise in accelerating the design of enzymes, drug molecules, and beyond.
arXiv Detail & Related papers (2022-09-30T01:46:38Z) - Protein 3D structure-based neural networks highly improve the accuracy
in compound-protein binding affinity prediction [7.059949221160259]
We develop Fast Evolutional Attention and Thoroughgoing-graph Neural Networks (FeatNN) to facilitate the application of protein 3D structure information for predicting compound-protein binding affinities (CPAs)
FeatNN considerably outperforms various state-of-the-art baselines in CPA prediction with the Pearson value elevated by about 35.7%.
arXiv Detail & Related papers (2022-03-30T00:44:15Z) - Interpretable Structured Learning with Sparse Gated Sequence Encoder for
Protein-Protein Interaction Prediction [2.9488233765621295]
Predicting protein-protein interactions (PPIs) by learning informative representations from amino acid sequences is a challenging yet important problem in biology.
We present a novel deep framework to model and predict PPIs from sequence alone.
Our model incorporates a bidirectional gated recurrent unit to learn sequence representations by leveraging contextualized and sequential information from sequences.
arXiv Detail & Related papers (2020-10-16T17:13:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.