CU-UD: text-mining drug and chemical-protein interactions with ensembles
of BERT-based models
- URL: http://arxiv.org/abs/2112.03004v1
- Date: Thu, 11 Nov 2021 13:55:21 GMT
- Title: CU-UD: text-mining drug and chemical-protein interactions with ensembles
of BERT-based models
- Authors: Mehmet Efruz Karabulut, K. Vijay-Shanker, Yifan Peng
- Abstract summary: BioCreative VII track 1 DrugProt task aims to promote the development and evaluation of systems that can automatically detect relations between chemical compounds/drugs and genes/proteins in PubMed abstracts.
We describe our submission, which is an ensemble system, including multiple BERT-based language models.
Our system obtained 0.7708 in precision and 0.7770 in recall, for an F1 score of 0.7739, demonstrating the effectiveness of using ensembles of BERT-based language models for automatically detecting relations between chemicals and proteins.
- Score: 12.08949974675794
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Identifying the relations between chemicals and proteins is an important text
mining task. BioCreative VII track 1 DrugProt task aims to promote the
development and evaluation of systems that can automatically detect relations
between chemical compounds/drugs and genes/proteins in PubMed abstracts. In
this paper, we describe our submission, which is an ensemble system, including
multiple BERT-based language models. We combine the outputs of individual
models using majority voting and multilayer perceptron. Our system obtained
0.7708 in precision and 0.7770 in recall, for an F1 score of 0.7739,
demonstrating the effectiveness of using ensembles of BERT-based language
models for automatically detecting relations between chemicals and proteins.
Our code is available at https://github.com/bionlplab/drugprot_bcvii.
Related papers
- BatGPT-Chem: A Foundation Large Model For Retrosynthesis Prediction [65.93303145891628]
BatGPT-Chem is a large language model with 15 billion parameters, tailored for enhanced retrosynthesis prediction.
Our model captures a broad spectrum of chemical knowledge, enabling precise prediction of reaction conditions.
This development empowers chemists to adeptly address novel compounds, potentially expediting the innovation cycle in drug manufacturing and materials science.
arXiv Detail & Related papers (2024-08-19T05:17:40Z) - MOTIVE: A Drug-Target Interaction Graph For Inductive Link Prediction [0.29998889086656577]
This paper introduces MOTIVE, a Morphological cOmpound Target Interaction Graph dataset comprising Cell Painting features for 11,000 genes and 3,600 compounds.
We provide random, cold-source (new drugs) and cold-target (new genes) data splits to enable rigorous evaluation under realistic use cases.
Our benchmark results show that graph neural networks that use Cell Painting features consistently outperform those that learn from graph structure alone.
arXiv Detail & Related papers (2024-06-12T21:18:14Z) - Learning to Denoise Biomedical Knowledge Graph for Robust Molecular Interaction Prediction [50.7901190642594]
We propose BioKDN (Biomedical Knowledge Graph Denoising Network) for robust molecular interaction prediction.
BioKDN refines the reliable structure of local subgraphs by denoising noisy links in a learnable manner.
It maintains consistent and robust semantics by smoothing relations around the target interaction.
arXiv Detail & Related papers (2023-12-09T07:08:00Z) - BioREx: Improving Biomedical Relation Extraction by Leveraging
Heterogeneous Datasets [7.7587371896752595]
Biomedical relation extraction (RE) is a central task in biomedical natural language processing (NLP) research.
We present a novel framework for systematically addressing the data heterogeneity of individual datasets and combining them into a large dataset.
Our evaluation shows that BioREx achieves significantly higher performance than the benchmark system trained on the individual dataset.
arXiv Detail & Related papers (2023-06-19T22:48:18Z) - Drug Synergistic Combinations Predictions via Large-Scale Pre-Training
and Graph Structure Learning [82.93806087715507]
Drug combination therapy is a well-established strategy for disease treatment with better effectiveness and less safety degradation.
Deep learning models have emerged as an efficient way to discover synergistic combinations.
Our framework achieves state-of-the-art results in comparison with other deep learning-based methods.
arXiv Detail & Related papers (2023-01-14T15:07:43Z) - BioRED: A Comprehensive Biomedical Relation Extraction Dataset [6.915371362219944]
We present BioRED, a first-of-its-kind biomedical RE corpus with multiple entity types and relation pairs.
We label each relation as describing either a novel finding or previously known background knowledge, enabling automated algorithms to differentiate between novel and background information.
Our results show that while existing approaches can reach high performance on the NER task, there is much room for improvement for the RE task.
arXiv Detail & Related papers (2022-04-08T19:23:49Z) - DrugOOD: Out-of-Distribution (OOD) Dataset Curator and Benchmark for
AI-aided Drug Discovery -- A Focus on Affinity Prediction Problems with Noise
Annotations [90.27736364704108]
We present DrugOOD, a systematic OOD dataset curator and benchmark for AI-aided drug discovery.
DrugOOD comes with an open-source Python package that fully automates benchmarking processes.
We focus on one of the most crucial problems in AIDD: drug target binding affinity prediction.
arXiv Detail & Related papers (2022-01-24T12:32:48Z) - Text Mining Drug/Chemical-Protein Interactions using an Ensemble of BERT
and T5 Based Models [3.7462395049372894]
In Track-1 of the BioCreative VII Challenge participants are asked to identify interactions between drugs/chemicals and proteins.
We attempt both a BERT-based sentence classification approach, and a more novel text-to-text approach using a T5 model.
arXiv Detail & Related papers (2021-11-30T18:14:06Z) - Does constituency analysis enhance domain-specific pre-trained BERT
models for relation extraction? [0.0]
The DrugProt track at BioCreative VII provides a manually-annotated corpus for the development and evaluation of relation extraction systems.
We describe the ensemble system that we used for our submission, which combines predictions of fine-tuned bioBERT, sciBERT and const-bioBERT models by majority voting.
arXiv Detail & Related papers (2021-11-25T10:27:10Z) - Pre-training Co-evolutionary Protein Representation via A Pairwise
Masked Language Model [93.9943278892735]
Key problem in protein sequence representation learning is to capture the co-evolutionary information reflected by the inter-residue co-variation in the sequences.
We propose a novel method to capture this information directly by pre-training via a dedicated language model, i.e., Pairwise Masked Language Model (PMLM)
Our result shows that the proposed method can effectively capture the interresidue correlations and improves the performance of contact prediction by up to 9% compared to the baseline.
arXiv Detail & Related papers (2021-10-29T04:01:32Z) - Neural networks for Anatomical Therapeutic Chemical (ATC) [83.73971067918333]
We propose combining multiple multi-label classifiers trained on distinct sets of features, including sets extracted from a Bidirectional Long Short-Term Memory Network (BiLSTM)
Experiments demonstrate the power of this approach, which is shown to outperform the best methods reported in the literature.
arXiv Detail & Related papers (2021-01-22T19:49:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.