BERN2: an advanced neural biomedical named entity recognition and
normalization tool
- URL: http://arxiv.org/abs/2201.02080v2
- Date: Mon, 10 Jan 2022 03:30:06 GMT
- Title: BERN2: an advanced neural biomedical named entity recognition and
normalization tool
- Authors: Mujeen Sung, Minbyul Jeong, Yonghwa Choi, Donghyeon Kim, Jinhyuk Lee
and Jaewoo Kang
- Abstract summary: BERN2 is a tool that improves the previous neural network-based NER tool.
We hope that our tool can help annotate large-scale biomedical texts more accurately.
- Score: 21.644896894525704
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In biomedical natural language processing, named entity recognition (NER) and
named entity normalization (NEN) are key tasks that enable the automatic
extraction of biomedical entities (e.g., diseases and chemicals) from the
ever-growing biomedical literature. In this paper, we present BERN2 (Advanced
Biomedical Entity Recognition and Normalization), a tool that improves the
previous neural network-based NER tool (Kim et al., 2019) by employing a
multi-task NER model and neural network-based NEN models to achieve much faster
and more accurate inference. We hope that our tool can help annotate
large-scale biomedical texts more accurately for various tasks such as
biomedical knowledge graph construction.
Related papers
- Multi-level biomedical NER through multi-granularity embeddings and
enhanced labeling [3.8599767910528917]
This paper proposes a hybrid approach that integrates the strengths of multiple models.
BERT provides contextualized word embeddings, a pre-trained multi-channel CNN for character-level information capture, and following by a BiLSTM + CRF for sequence labelling and modelling dependencies between the words in the text.
We evaluate our model on the benchmark i2b2/2010 dataset, achieving an F1-score of 90.11.
arXiv Detail & Related papers (2023-12-24T21:45:36Z) - Diversifying Knowledge Enhancement of Biomedical Language Models using
Adapter Modules and Knowledge Graphs [54.223394825528665]
We develop an approach that uses lightweight adapter modules to inject structured biomedical knowledge into pre-trained language models.
We use two large KGs, the biomedical knowledge system UMLS and the novel biochemical OntoChem, with two prominent biomedical PLMs, PubMedBERT and BioLinkBERT.
We show that our methodology leads to performance improvements in several instances while keeping requirements in computing power low.
arXiv Detail & Related papers (2023-12-21T14:26:57Z) - Biomedical Language Models are Robust to Sub-optimal Tokenization [30.175714262031253]
Most modern biomedical language models (LMs) are pre-trained using standard domain-specific tokenizers.
We find that pre-training a biomedical LM using a more accurate biomedical tokenizer does not improve the entity representation quality of a language model.
arXiv Detail & Related papers (2023-06-30T13:35:24Z) - BiomedGPT: A Generalist Vision-Language Foundation Model for Diverse Biomedical Tasks [68.39821375903591]
Generalist AI holds the potential to address limitations due to its versatility in interpreting different data types.
Here, we propose BiomedGPT, the first open-source and lightweight vision-language foundation model.
arXiv Detail & Related papers (2023-05-26T17:14:43Z) - AIONER: All-in-one scheme-based biomedical named entity recognition
using deep learning [7.427654811697884]
We present AIONER, a general-purpose BioNER tool based on cutting-edge deep learning and our AIO schema.
AIONER is effective, robust, and compares favorably to other state-of-the-art approaches such as multi-task learning.
arXiv Detail & Related papers (2022-11-30T12:35:00Z) - BioGPT: Generative Pre-trained Transformer for Biomedical Text
Generation and Mining [140.61707108174247]
We propose BioGPT, a domain-specific generative Transformer language model pre-trained on large scale biomedical literature.
We get 44.98%, 38.42% and 40.76% F1 score on BC5CDR, KD-DTI and DDI end-to-end relation extraction tasks respectively, and 78.2% accuracy on PubMedQA.
arXiv Detail & Related papers (2022-10-19T07:17:39Z) - Distantly supervised end-to-end medical entity extraction from
electronic health records with human-level quality [77.34726150561087]
We propose a novel method of doing medical EE from electronic health records ( EHR) as a single-step multi-label classification task.
Our model is trained end-to-end in a distantly supervised manner using targets automatically extracted from medical knowledge base.
Our work demonstrates that medical entity extraction can be done end-to-end without human supervision and with human quality given the availability of a large enough amount of unlabeled EHR and a medical knowledge base.
arXiv Detail & Related papers (2022-01-25T17:04:46Z) - Recognising Biomedical Names: Challenges and Solutions [9.51284672475743]
We propose a transition-based NER model which can recognise discontinuous mentions.
We also develop a cost-effective approach that nominates the suitable pre-training data.
Our contributions have obvious practical implications, especially when new biomedical applications are needed.
arXiv Detail & Related papers (2021-06-23T08:20:13Z) - BioALBERT: A Simple and Effective Pre-trained Language Model for
Biomedical Named Entity Recognition [9.05154470433578]
Existing BioNER approaches often neglect these issues and directly adopt the state-of-the-art (SOTA) models.
We propose biomedical ALBERT, an effective domain-specific language model trained on large-scale biomedical corpora.
arXiv Detail & Related papers (2020-09-19T12:58:47Z) - Machine Learning in Nano-Scale Biomedical Engineering [77.75587007080894]
We review the existing research regarding the use of machine learning in nano-scale biomedical engineering.
The main challenges that can be formulated as ML problems are classified into the three main categories.
For each of the presented methodologies, special emphasis is given to its principles, applications, and limitations.
arXiv Detail & Related papers (2020-08-05T15:45:54Z) - Domain-Specific Language Model Pretraining for Biomedical Natural
Language Processing [73.37262264915739]
We show that for domains with abundant unlabeled text, such as biomedicine, pretraining language models from scratch results in substantial gains.
Our experiments show that domain-specific pretraining serves as a solid foundation for a wide range of biomedical NLP tasks.
arXiv Detail & Related papers (2020-07-31T00:04:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.