BioRED: A Comprehensive Biomedical Relation Extraction Dataset
- URL: http://arxiv.org/abs/2204.04263v1
- Date: Fri, 8 Apr 2022 19:23:49 GMT
- Title: BioRED: A Comprehensive Biomedical Relation Extraction Dataset
- Authors: Ling Luo, Po-Ting Lai, Chih-Hsuan Wei, Cecilia N Arighi, Zhiyong Lu
- Abstract summary: We present BioRED, a first-of-its-kind biomedical RE corpus with multiple entity types and relation pairs.
We label each relation as describing either a novel finding or previously known background knowledge, enabling automated algorithms to differentiate between novel and background information.
Our results show that while existing approaches can reach high performance on the NER task, there is much room for improvement for the RE task.
- Score: 6.915371362219944
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Automated relation extraction (RE) from biomedical literature is critical for
many downstream text mining applications in both research and real-world
settings. However, most existing benchmarking datasets for bio-medical RE only
focus on relations of a single type (e.g., protein-protein interactions) at the
sentence level, greatly limiting the development of RE systems in biomedicine.
In this work, we first review commonly used named entity recognition (NER) and
RE datasets. Then we present BioRED, a first-of-its-kind biomedical RE corpus
with multiple entity types (e.g., gene/protein, disease, chemical) and relation
pairs (e.g., gene-disease; chemical-chemical), on a set of 600 PubMed articles.
Further, we label each relation as describing either a novel finding or
previously known background knowledge, enabling automated algorithms to
differentiate between novel and background information. We assess the utility
of BioRED by benchmarking several existing state-of-the-art methods, including
BERT-based models, on the NER and RE tasks. Our results show that while
existing approaches can reach high performance on the NER task (F-score of
89.3%), there is much room for improvement for the RE task, especially when
extracting novel relations (F-score of 47.7%). Our experiments also demonstrate
that such a comprehensive dataset can successfully facilitate the development
of more accurate, efficient, and robust RE systems for biomedicine.
Related papers
- Augmenting Biomedical Named Entity Recognition with General-domain Resources [47.24727904076347]
Training a neural network-based biomedical named entity recognition (BioNER) model usually requires extensive and costly human annotations.
We propose GERBERA, a simple-yet-effective method that utilized a general-domain NER dataset for training.
We systematically evaluated GERBERA on five datasets of eight entity types, collectively consisting of 81,410 instances.
arXiv Detail & Related papers (2024-06-15T15:28:02Z) - BioBERT-based Deep Learning and Merged ChemProt-DrugProt for Enhanced Biomedical Relation Extraction [2.524192238862961]
Our approach integrates the ChemProt and DrugProt datasets using a novel merging strategy.
The study highlights the potential of automated information extraction in biomedical research and clinical practice.
arXiv Detail & Related papers (2024-05-28T21:34:01Z) - Diversifying Knowledge Enhancement of Biomedical Language Models using
Adapter Modules and Knowledge Graphs [54.223394825528665]
We develop an approach that uses lightweight adapter modules to inject structured biomedical knowledge into pre-trained language models.
We use two large KGs, the biomedical knowledge system UMLS and the novel biochemical OntoChem, with two prominent biomedical PLMs, PubMedBERT and BioLinkBERT.
We show that our methodology leads to performance improvements in several instances while keeping requirements in computing power low.
arXiv Detail & Related papers (2023-12-21T14:26:57Z) - BioREx: Improving Biomedical Relation Extraction by Leveraging
Heterogeneous Datasets [7.7587371896752595]
Biomedical relation extraction (RE) is a central task in biomedical natural language processing (NLP) research.
We present a novel framework for systematically addressing the data heterogeneity of individual datasets and combining them into a large dataset.
Our evaluation shows that BioREx achieves significantly higher performance than the benchmark system trained on the individual dataset.
arXiv Detail & Related papers (2023-06-19T22:48:18Z) - BiomedGPT: A Generalist Vision-Language Foundation Model for Diverse Biomedical Tasks [68.39821375903591]
Generalist AI holds the potential to address limitations due to its versatility in interpreting different data types.
Here, we propose BiomedGPT, the first open-source and lightweight vision-language foundation model.
arXiv Detail & Related papers (2023-05-26T17:14:43Z) - End-to-End Models for Chemical-Protein Interaction Extraction: Better
Tokenization and Span-Based Pipeline Strategies [1.782718930156674]
We employ a span-based pipeline approach to produce a new state-of-the-art E2ERE performance on the ChemProt dataset.
Our results indicate that a straightforward fine-grained tokenization scheme helps span-based approaches excel in E2ERE.
arXiv Detail & Related papers (2023-04-03T20:20:22Z) - AIONER: All-in-one scheme-based biomedical named entity recognition
using deep learning [7.427654811697884]
We present AIONER, a general-purpose BioNER tool based on cutting-edge deep learning and our AIO schema.
AIONER is effective, robust, and compares favorably to other state-of-the-art approaches such as multi-task learning.
arXiv Detail & Related papers (2022-11-30T12:35:00Z) - DrugOOD: Out-of-Distribution (OOD) Dataset Curator and Benchmark for
AI-aided Drug Discovery -- A Focus on Affinity Prediction Problems with Noise
Annotations [90.27736364704108]
We present DrugOOD, a systematic OOD dataset curator and benchmark for AI-aided drug discovery.
DrugOOD comes with an open-source Python package that fully automates benchmarking processes.
We focus on one of the most crucial problems in AIDD: drug target binding affinity prediction.
arXiv Detail & Related papers (2022-01-24T12:32:48Z) - BioIE: Biomedical Information Extraction with Multi-head Attention
Enhanced Graph Convolutional Network [9.227487525657901]
We propose Biomedical Information Extraction, a hybrid neural network to extract relations from biomedical text and unstructured medical reports.
We evaluate our model on two major biomedical relationship extraction tasks, chemical-disease relation and chemical-protein interaction, and a cross-hospital pan-cancer pathology report corpus.
arXiv Detail & Related papers (2021-10-26T13:19:28Z) - Discovering Drug-Target Interaction Knowledge from Biomedical Literature [107.98712673387031]
The Interaction between Drugs and Targets (DTI) in human body plays a crucial role in biomedical science and applications.
As millions of papers come out every year in the biomedical domain, automatically discovering DTI knowledge from literature becomes an urgent demand in the industry.
We explore the first end-to-end solution for this task by using generative approaches.
We regard the DTI triplets as a sequence and use a Transformer-based model to directly generate them without using the detailed annotations of entities and relations.
arXiv Detail & Related papers (2021-09-27T17:00:14Z) - Neural networks for Anatomical Therapeutic Chemical (ATC) [83.73971067918333]
We propose combining multiple multi-label classifiers trained on distinct sets of features, including sets extracted from a Bidirectional Long Short-Term Memory Network (BiLSTM)
Experiments demonstrate the power of this approach, which is shown to outperform the best methods reported in the literature.
arXiv Detail & Related papers (2021-01-22T19:49:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.