Assigning function to protein-protein interactions: a weakly supervised
BioBERT based approach using PubMed abstracts
- URL: http://arxiv.org/abs/2008.08727v3
- Date: Thu, 6 Jan 2022 20:04:07 GMT
- Title: Assigning function to protein-protein interactions: a weakly supervised
BioBERT based approach using PubMed abstracts
- Authors: Aparna Elangovan, Melissa Davis and Karin Verspoor
- Abstract summary: Protein-protein interactions (PPI) are critical to the function of proteins in both normal and diseased cells.
Only a small percentage of PPIs captured in protein interaction databases have annotations of function available.
Here, we aim to label the function type of PPIs by extracting relationships described in PubMed abstracts.
- Score: 2.208694022993555
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Motivation: Protein-protein interactions (PPI) are critical to the function
of proteins in both normal and diseased cells, and many critical protein
functions are mediated by interactions.Knowledge of the nature of these
interactions is important for the construction of networks to analyse
biological data. However, only a small percentage of PPIs captured in protein
interaction databases have annotations of function available, e.g. only 4% of
PPI are functionally annotated in the IntAct database. Here, we aim to label
the function type of PPIs by extracting relationships described in PubMed
abstracts.
Method: We create a weakly supervised dataset from the IntAct PPI database
containing interacting protein pairs with annotated function and associated
abstracts from the PubMed database. We apply a state-of-the-art deep learning
technique for biomedical natural language processing tasks, BioBERT, to build a
model - dubbed PPI-BioBERT - for identifying the function of PPIs. In order to
extract high quality PPI functions at large scale, we use an ensemble of
PPI-BioBERT models to improve uncertainty estimation and apply an interaction
type-specific threshold to counteract the effects of variations in the number
of training samples per interaction type.
Results: We scan 18 million PubMed abstracts to automatically identify 3253
new typed PPIs, including phosphorylation and acetylation interactions, with an
overall precision of 46% (87% for acetylation) based on a human-reviewed
sample. This work demonstrates that analysis of biomedical abstracts for PPI
function extraction is a feasible approach to substantially increasing the
number of interactions annotated with function captured in online databases.
Related papers
- MeToken: Uniform Micro-environment Token Boosts Post-Translational Modification Prediction [65.33218256339151]
Post-translational modifications (PTMs) profoundly expand the complexity and functionality of the proteome.
Existing computational approaches predominantly focus on protein sequences to predict PTM sites, driven by the recognition of sequence-dependent motifs.
We introduce the MeToken model, which tokenizes the micro-environment of each acid, integrating both sequence and structural information into unified discrete tokens.
arXiv Detail & Related papers (2024-11-04T07:14:28Z) - ProLLM: Protein Chain-of-Thoughts Enhanced LLM for Protein-Protein Interaction Prediction [54.132290875513405]
The prediction of protein-protein interactions (PPIs) is crucial for understanding biological functions and diseases.
Previous machine learning approaches to PPI prediction mainly focus on direct physical interactions.
We propose a novel framework ProLLM that employs an LLM tailored for PPI for the first time.
arXiv Detail & Related papers (2024-03-30T05:32:42Z) - Extracting Protein-Protein Interactions (PPIs) from Biomedical
Literature using Attention-based Relational Context Information [5.456047952635665]
This work presents a unified, multi-source PPI corpora with vetted interaction definitions augmented by binary interaction type labels.
A Transformer-based deep learning method exploits entities' relational context information for relation representation to improve relation classification performance.
The model's performance is evaluated on four widely studied biomedical relation extraction datasets.
arXiv Detail & Related papers (2024-03-08T01:43:21Z) - MAPE-PPI: Towards Effective and Efficient Protein-Protein Interaction
Prediction via Microenvironment-Aware Protein Embedding [82.31506767274841]
Protein-Protein Interactions (PPIs) are fundamental in various biological processes and play a key role in life activities.
MPAE-PPI encodes microenvironments into chemically meaningful discrete codes via a sufficiently large microenvironment "vocabulary"
MPAE-PPI can scale to PPI prediction with millions of PPIs with superior trade-offs between effectiveness and computational efficiency.
arXiv Detail & Related papers (2024-02-22T09:04:41Z) - Effective Protein-Protein Interaction Exploration with PPIretrieval [46.07027715907749]
We propose PPIretrieval, the first deep learning-based model for protein-protein interaction exploration.
PPIretrieval searches for potential PPIs in an embedding space, capturing rich geometric and chemical information of protein surfaces.
arXiv Detail & Related papers (2024-02-06T03:57:06Z) - Learning to Denoise Biomedical Knowledge Graph for Robust Molecular Interaction Prediction [50.7901190642594]
We propose BioKDN (Biomedical Knowledge Graph Denoising Network) for robust molecular interaction prediction.
BioKDN refines the reliable structure of local subgraphs by denoising noisy links in a learnable manner.
It maintains consistent and robust semantics by smoothing relations around the target interaction.
arXiv Detail & Related papers (2023-12-09T07:08:00Z) - Evaluation of GPT and BERT-based models on identifying protein-protein
interactions in biomedical text [1.3923237289777164]
Pre-trained language models, such as generative pre-trained transformers (GPT) and bidirectional encoder representations from transformers (BERT), have shown promising results in natural language processing (NLP) tasks.
We evaluated the performance of PPI identification of multiple GPT and BERT models using three manually curated gold-standard corpora.
arXiv Detail & Related papers (2023-03-30T22:06:10Z) - A Supervised Machine Learning Approach for Sequence Based
Protein-protein Interaction (PPI) Prediction [4.916874464940376]
Computational protein-protein interaction (PPI) prediction techniques can contribute greatly in reducing time, cost and false-positive interactions.
We have described our submitted solution with the results of the SeqPIP competition.
arXiv Detail & Related papers (2022-03-23T18:27:25Z) - Learning Unknown from Correlations: Graph Neural Network for
Inter-novel-protein Interaction Prediction [7.860159889216291]
Existing methods suffer from significant performance degradation when tested in unseen dataset.
We propose a graph neural network based method (GNN-PPI) for better inter-novel-protein interaction prediction.
arXiv Detail & Related papers (2021-05-14T08:42:55Z) - HINT: Hierarchical Interaction Network for Trial Outcome Prediction
Leveraging Web Data [56.53715632642495]
Clinical trials face uncertain outcomes due to issues with efficacy, safety, or problems with patient recruitment.
In this paper, we propose Hierarchical INteraction Network (HINT) for more general, clinical trial outcome predictions.
arXiv Detail & Related papers (2021-02-08T15:09:07Z) - Biomedical Information Extraction for Disease Gene Prioritization [0.34998703934432673]
We introduce a biomedical information extraction pipeline that extracts biological relationships from text.
We apply it to tens of millions of PubMed abstracts to extract protein-protein interactions (PPIs) and augment these extractions to a biomedical knowledge graph.
We show that, despite already containing PPIs from an established structured source, augmenting our own IE-based extractions to the graph allows us to predict novel disease-gene associations with a 20% relative increase in hit@30.
arXiv Detail & Related papers (2020-11-10T15:38:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.