Related papers: BioNerFlair: biomedical named entity recognition using flair embedding and sequence tagger

BioNerFlair: biomedical named entity recognition using flair embedding and sequence tagger

URL: http://arxiv.org/abs/2011.01504v1
Date: Tue, 3 Nov 2020 06:46:45 GMT
Title: BioNerFlair: biomedical named entity recognition using flair embedding and sequence tagger
Authors: Harsh Patel
Abstract summary: We introduce BioNerFlair, a method to train models for biomedical named entity recognition. With almost the same generic architecture widely used for named entity recognition, BioNerFlair outperforms previous state-of-the-art models.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Motivation: The proliferation of Biomedical research articles has made the task of information retrieval more important than ever. Scientists and Researchers are having difficulty in finding articles that contain information relevant to them. Proper extraction of biomedical entities like Disease, Drug/chem, Species, Gene/protein, can considerably improve the filtering of articles resulting in better extraction of relevant information. Performance on BioNer benchmarks has progressively improved because of progression in transformers-based models like BERT, XLNet, OpenAI, GPT2, etc. These models give excellent results; however, they are computationally expensive and we can achieve better scores for domain-specific tasks using other contextual string-based models and LSTM-CRF based sequence tagger. Results: We introduce BioNerFlair, a method to train models for biomedical named entity recognition using Flair plus GloVe embeddings and Bidirectional LSTM-CRF based sequence tagger. With almost the same generic architecture widely used for named entity recognition, BioNerFlair outperforms previous state-of-the-art models. I performed experiments on 8 benchmarks datasets for biomedical named entity recognition. Compared to current state-of-the-art models, BioNerFlair achieves the best F1-score of 90.17 beyond 84.72 on the BioCreative II gene mention (BC2GM) corpus, best F1-score of 94.03 beyond 92.36 on the BioCreative IV chemical and drug (BC4CHEMD) corpus, best F1-score of 88.73 beyond 78.58 on the JNLPBA corpus, best F1-score of 91.1 beyond 89.71 on the NCBI disease corpus, best F1-score of 85.48 beyond 78.98 on the Species-800 corpus, while near best results was observed on BC5CDR-chem, BC3CDR-disease, and LINNAEUS corpus.

Related papers

GLiNER-BioMed: A Suite of Efficient Models for Open Biomedical Named Entity Recognition [0.06554326244334868]
We introduce GLiNER-BioMed, a domain-adapted suite of Generalist and Lightweight Model for NER (GLiNER) models specifically tailored for biomedicine.<n>In contrast to conventional approaches, GLiNER uses natural language labels to infer arbitrary entity types, enabling zero-shot recognition.<n> Experiments on several biomedical datasets demonstrate that GLiNER-BioMed outperforms the state-of-the-art in both zero-shot scenarios.
arXiv Detail & Related papers (2025-04-01T11:40:50Z)
Augmenting Biomedical Named Entity Recognition with General-domain Resources [47.24727904076347]
Training a neural network-based biomedical named entity recognition (BioNER) model usually requires extensive and costly human annotations. We propose GERBERA, a simple-yet-effective method that utilized a general-domain NER dataset for training. We systematically evaluated GERBERA on five datasets of eight entity types, collectively consisting of 81,410 instances.
arXiv Detail & Related papers (2024-06-15T15:28:02Z)
BMRetriever: Tuning Large Language Models as Better Biomedical Text Retrievers [48.21255861863282]
BMRetriever is a series of dense retrievers for enhancing biomedical retrieval. BMRetriever exhibits strong parameter efficiency, with the 410M variant outperforming baselines up to 11.7 times larger.
arXiv Detail & Related papers (2024-04-29T05:40:08Z)
BioMedLM: A 2.7B Parameter Language Model Trained On Biomedical Text [82.7001841679981]
BioMedLM is a 2.7 billion parameter GPT-style autoregressive model trained exclusively on PubMed abstracts and full articles. When fine-tuned, BioMedLM can produce strong multiple-choice biomedical question-answering results competitive with larger models. BioMedLM can also be fine-tuned to produce useful answers to patient questions on medical topics.
arXiv Detail & Related papers (2024-03-27T10:18:21Z)
Improving Biomedical Entity Linking with Retrieval-enhanced Learning [53.24726622142558]
$k$NN-BioEL provides a BioEL model with the ability to reference similar instances from the entire training corpus as clues for prediction. We show that $k$NN-BioEL outperforms state-of-the-art baselines on several datasets.
arXiv Detail & Related papers (2023-12-15T14:04:23Z)
BiomedGPT: A Generalist Vision-Language Foundation Model for Diverse Biomedical Tasks [68.39821375903591]
Generalist AI holds the potential to address limitations due to its versatility in interpreting different data types. Here, we propose BiomedGPT, the first open-source and lightweight vision-language foundation model.
arXiv Detail & Related papers (2023-05-26T17:14:43Z)
From Zero to Hero: Harnessing Transformers for Biomedical Named Entity Recognition in Zero- and Few-shot Contexts [0.0]
This paper proposes a method for zero- and few-shot NER in the biomedical domain. We have achieved average F1 scores of 35.44% for zero-shot NER, 50.10% for one-shot NER, 69.94% for 10-shot NER, and 79.51% for 100-shot NER on 9 diverse evaluated biomedical entities.
arXiv Detail & Related papers (2023-05-05T12:14:22Z)
Evaluation of GPT and BERT-based models on identifying protein-protein interactions in biomedical text [1.3923237289777164]
Pre-trained language models, such as generative pre-trained transformers (GPT) and bidirectional encoder representations from transformers (BERT), have shown promising results in natural language processing (NLP) tasks. We evaluated the performance of PPI identification of multiple GPT and BERT models using three manually curated gold-standard corpora.
arXiv Detail & Related papers (2023-03-30T22:06:10Z)
BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining [140.61707108174247]
We propose BioGPT, a domain-specific generative Transformer language model pre-trained on large scale biomedical literature. We get 44.98%, 38.42% and 40.76% F1 score on BC5CDR, KD-DTI and DDI end-to-end relation extraction tasks respectively, and 78.2% accuracy on PubMedQA.
arXiv Detail & Related papers (2022-10-19T07:17:39Z)
BioRED: A Comprehensive Biomedical Relation Extraction Dataset [6.915371362219944]
We present BioRED, a first-of-its-kind biomedical RE corpus with multiple entity types and relation pairs. We label each relation as describing either a novel finding or previously known background knowledge, enabling automated algorithms to differentiate between novel and background information. Our results show that while existing approaches can reach high performance on the NER task, there is much room for improvement for the RE task.
arXiv Detail & Related papers (2022-04-08T19:23:49Z)
Benchmarking for Biomedical Natural Language Processing Tasks with a Domain Specific ALBERT [9.8215089151757]
We present BioALBERT, a domain-specific adaptation of A Lite Bidirectional Representations from Transformers (ALBERT) It is trained on biomedical and PubMed Central and clinical corpora and fine tuned for 6 different tasks across 20 benchmark datasets. It represents a new state of the art in 17 out of 20 benchmark datasets.
arXiv Detail & Related papers (2021-07-09T11:47:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.