Biomedical Named Entity Recognition at Scale
- URL: http://arxiv.org/abs/2011.06315v1
- Date: Thu, 12 Nov 2020 11:10:17 GMT
- Title: Biomedical Named Entity Recognition at Scale
- Authors: Veysel Kocaman and David Talby
- Abstract summary: We present a single trainable NER model that obtains new state-of-the-art results on seven public biomedical benchmarks.
This model is freely available within a production-grade code base as part of the open-source Spark NLP library.
- Score: 6.85316573653194
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Named entity recognition (NER) is a widely applicable natural language
processing task and building block of question answering, topic modeling,
information retrieval, etc. In the medical domain, NER plays a crucial role by
extracting meaningful chunks from clinical notes and reports, which are then
fed to downstream tasks like assertion status detection, entity resolution,
relation extraction, and de-identification. Reimplementing a Bi-LSTM-CNN-Char
deep learning architecture on top of Apache Spark, we present a single
trainable NER model that obtains new state-of-the-art results on seven public
biomedical benchmarks without using heavy contextual embeddings like BERT. This
includes improving BC4CHEMD to 93.72% (4.1% gain), Species800 to 80.91% (4.6%
gain), and JNLPBA to 81.29% (5.2% gain). In addition, this model is freely
available within a production-grade code base as part of the open-source Spark
NLP library; can scale up for training and inference in any Spark cluster; has
GPU support and libraries for popular programming languages such as Python, R,
Scala and Java; and can be extended to support other human languages with no
code changes.
Related papers
- Low-Resource Named Entity Recognition with Cross-Lingual, Character-Level Neural Conditional Random Fields [68.17213992395041]
Low-resource named entity recognition is still an open problem in NLP.
We present a transfer learning scheme, whereby we train character-level neural CRFs to predict named entities for both high-resource languages and low resource languages jointly.
arXiv Detail & Related papers (2024-04-14T23:44:49Z) - On Significance of Subword tokenization for Low Resource and Efficient
Named Entity Recognition: A case study in Marathi [1.6383036433216434]
We focus on NER for low-resource language and present our case study in the context of the Indian language Marathi.
We propose a hybrid approach for efficient NER by integrating a BERT-based subword tokenizer into vanilla CNN/LSTM models.
We show that this simple approach of replacing a traditional word-based tokenizer with a BERT-tokenizer brings the accuracy of vanilla single-layer models closer to that of deep pre-trained models like BERT.
arXiv Detail & Related papers (2023-12-03T06:53:53Z) - IXA/Cogcomp at SemEval-2023 Task 2: Context-enriched Multilingual Named
Entity Recognition using Knowledge Bases [53.054598423181844]
We present a novel NER cascade approach comprising three steps.
We empirically demonstrate the significance of external knowledge bases in accurately classifying fine-grained and emerging entities.
Our system exhibits robust performance in the MultiCoNER2 shared task, even in the low-resource language setting.
arXiv Detail & Related papers (2023-04-20T20:30:34Z) - Machine and Deep Learning Methods with Manual and Automatic Labelling
for News Classification in Bangla Language [0.36832029288386137]
This paper introduces several machine and deep learning methods with manual and automatic labelling for news classification in the Bangla language.
We implement several machine (ML) and deep learning (DL) algorithms. The ML algorithms are Logistic Regression (LR), Gradient Descent (SGD), Support Vector Machine (SVM), Random Forest (RF), and K-Nearest Neighbour (KNN)
We develop automatic labelling methods using Latent Dirichlet Allocation (LDA) and investigate the performance of single-label and multi-label article classification methods.
arXiv Detail & Related papers (2022-10-19T21:53:49Z) - CROP: Zero-shot Cross-lingual Named Entity Recognition with Multilingual
Labeled Sequence Translation [113.99145386490639]
Cross-lingual NER can transfer knowledge between languages via aligned cross-lingual representations or machine translation results.
We propose a Cross-lingual Entity Projection framework (CROP) to enable zero-shot cross-lingual NER.
We adopt a multilingual labeled sequence translation model to project the tagged sequence back to the target language and label the target raw sentence.
arXiv Detail & Related papers (2022-10-13T13:32:36Z) - An Open-Source Dataset and A Multi-Task Model for Malay Named Entity
Recognition [3.511753382329252]
We build a Malay NER dataset (MYNER) comprising 28,991 sentences (over 384 thousand tokens)
An auxiliary task, boundary detection, is introduced to improve NER training in both explicit and implicit ways.
arXiv Detail & Related papers (2021-09-03T03:29:25Z) - Reinforced Iterative Knowledge Distillation for Cross-Lingual Named
Entity Recognition [54.92161571089808]
Cross-lingual NER transfers knowledge from rich-resource language to languages with low resources.
Existing cross-lingual NER methods do not make good use of rich unlabeled data in target languages.
We develop a novel approach based on the ideas of semi-supervised learning and reinforcement learning.
arXiv Detail & Related papers (2021-06-01T05:46:22Z) - Chest x-ray automated triage: a semiologic approach designed for
clinical implementation, exploiting different types of labels through a
combination of four Deep Learning architectures [83.48996461770017]
This work presents a Deep Learning method based on the late fusion of different convolutional architectures.
We built four training datasets combining images from public chest x-ray datasets and our institutional archive.
We trained four different Deep Learning architectures and combined their outputs with a late fusion strategy, obtaining a unified tool.
arXiv Detail & Related papers (2020-12-23T14:38:35Z) - Bootstrapping Named Entity Recognition in E-Commerce with Positive
Unlabeled Learning [13.790883865748004]
We present a bootstrapped positive-unlabeled learning algorithm that integrates domain-specific linguistic features to quickly and efficiently expand the seed dictionary.
The model achieves an average F1 score of 72.02% on a novel dataset of product descriptions, an improvement of 3.63% over a baseline BiLSTM classifier.
arXiv Detail & Related papers (2020-05-22T09:35:30Z) - Soft Gazetteers for Low-Resource Named Entity Recognition [78.00856159473393]
We propose a method of "soft gazetteers" that incorporates ubiquitously available information from English knowledge bases into neural named entity recognition models.
Our experiments on four low-resource languages show an average improvement of 4 points in F1 score.
arXiv Detail & Related papers (2020-05-04T21:58:02Z) - Learning Cross-Context Entity Representations from Text [9.981223356176496]
We investigate the use of a fill-in-the-blank task to learn context independent representations of entities from text contexts.
We show that large scale training of neural models allows us to learn high quality entity representations.
Our global entity representations encode fine-grained type categories, such as Scottish footballers, and can answer trivia questions.
arXiv Detail & Related papers (2020-01-11T15:30:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.