Chemical Identification and Indexing in PubMed Articles via BERT and
Text-to-Text Approaches
- URL: http://arxiv.org/abs/2111.15622v1
- Date: Tue, 30 Nov 2021 18:21:06 GMT
- Title: Chemical Identification and Indexing in PubMed Articles via BERT and
Text-to-Text Approaches
- Authors: Virginia Adams, Hoo-Chang Shin, Carol Anderson, Bo Liu, Anas Abidin
- Abstract summary: The Biocreative VII Track-2 challenge consists of named entity recognition, entity-linking (or entity-normalization), and topic indexing tasks.
We achieve our best performance with BERT-based BioMegatron models.
In addition to conventional NER methods, we attempt both named entity recognition and entity linking with a novel text-to-text or "prompt" based method.
- Score: 3.7462395049372894
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Biocreative VII Track-2 challenge consists of named entity recognition,
entity-linking (or entity-normalization), and topic indexing tasks -- with
entities and topics limited to chemicals for this challenge. Named entity
recognition is a well-established problem and we achieve our best performance
with BERT-based BioMegatron models. We extend our BERT-based approach to the
entity linking task. After the second stage of pretraining BioBERT with a
metric-learning loss strategy called self-alignment pretraining (SAP), we link
entities based on the cosine similarity between their SAP-BioBERT word
embeddings. Despite the success of our named entity recognition experiments, we
find the chemical indexing task generally more challenging.
In addition to conventional NER methods, we attempt both named entity
recognition and entity linking with a novel text-to-text or "prompt" based
method that uses generative language models such as T5 and GPT. We achieve
encouraging results with this new approach.
Related papers
- Disambiguation of Company names via Deep Recurrent Networks [101.90357454833845]
We propose a Siamese LSTM Network approach to extract -- via supervised learning -- an embedding of company name strings.
We analyse how an Active Learning approach to prioritise the samples to be labelled leads to a more efficient overall learning pipeline.
arXiv Detail & Related papers (2023-03-07T15:07:57Z) - Nested Named Entity Recognition from Medical Texts: An Adaptive Shared
Network Architecture with Attentive CRF [53.55504611255664]
We propose a novel method, referred to as ASAC, to solve the dilemma caused by the nested phenomenon.
The proposed method contains two key modules: the adaptive shared (AS) part and the attentive conditional random field (ACRF) module.
Our model could learn better entity representations by capturing the implicit distinctions and relationships between different categories of entities.
arXiv Detail & Related papers (2022-11-09T09:23:56Z) - Hierarchical Transformer Model for Scientific Named Entity Recognition [0.20646127669654832]
We present a simple and effective approach for Named Entity Recognition.
The main idea of our approach is to encode the input subword sequence with a pre-trained transformer such as BERT.
We evaluate our approach on three benchmark datasets for scientific NER.
arXiv Detail & Related papers (2022-03-28T12:59:06Z) - WCL-BBCD: A Contrastive Learning and Knowledge Graph Approach to Named
Entity Recognition [15.446770390648874]
We propose a novel named entity recognition model WCL-BBCD (Word Contrastive Learning with BERT-BiLSTM-CRF-DBpedia)
The model first trains the sentence pairs in the text, calculate similarity between words in sentence pairs by cosine similarity, and fine-tunes the BERT model used for the named entity recognition task through the similarity.
Finally, the recognition results are corrected in combination with prior knowledge such as knowledge graphs, so as to alleviate the recognition caused by word abbreviations low-rate problem.
arXiv Detail & Related papers (2022-03-14T08:29:58Z) - Knowledge-Rich Self-Supervised Entity Linking [58.838404666183656]
Knowledge-RIch Self-Supervision ($tt KRISSBERT$) is a universal entity linker for four million UMLS entities.
Our approach subsumes zero-shot and few-shot methods, and can easily incorporate entity descriptions and gold mention labels if available.
Without using any labeled information, our method produces $tt KRISSBERT$, a universal entity linker for four million UMLS entities.
arXiv Detail & Related papers (2021-12-15T05:05:12Z) - Improving Tagging Consistency and Entity Coverage for Chemical
Identification in Full-text Articles [17.24298646089662]
This paper is a technical report on our system submitted to the chemical identification task of the BioCreative VII Track 2 challenge.
We aim to improve tagging consistency and entity coverage using various methods.
In the official evaluation of the challenge, our system was ranked 1st in NER by significantly outperforming the baseline model.
arXiv Detail & Related papers (2021-11-20T13:13:58Z) - Discovering Drug-Target Interaction Knowledge from Biomedical Literature [107.98712673387031]
The Interaction between Drugs and Targets (DTI) in human body plays a crucial role in biomedical science and applications.
As millions of papers come out every year in the biomedical domain, automatically discovering DTI knowledge from literature becomes an urgent demand in the industry.
We explore the first end-to-end solution for this task by using generative approaches.
We regard the DTI triplets as a sequence and use a Transformer-based model to directly generate them without using the detailed annotations of entities and relations.
arXiv Detail & Related papers (2021-09-27T17:00:14Z) - Fast and Effective Biomedical Entity Linking Using a Dual Encoder [48.86736921025866]
We propose a BERT-based dual encoder model that resolves multiple mentions in a document in one shot.
We show that our proposed model is multiple times faster than existing BERT-based models while being competitive in accuracy for biomedical entity linking.
arXiv Detail & Related papers (2021-03-08T19:32:28Z) - A hybrid deep-learning approach for complex biochemical named entity
recognition [9.657827522380712]
Named entity recognition (NER) of chemicals and drugs is a critical domain of information extraction in biochemical research.
Here, we propose a hybrid deep learning approach to improve the recognition accuracy of NER.
arXiv Detail & Related papers (2020-12-20T01:30:07Z) - UmlsBERT: Clinical Domain Knowledge Augmentation of Contextual
Embeddings Using the Unified Medical Language System Metathesaurus [73.86656026386038]
We introduce UmlsBERT, a contextual embedding model that integrates domain knowledge during the pre-training process.
By applying these two strategies, UmlsBERT can encode clinical domain knowledge into word embeddings and outperform existing domain-specific models.
arXiv Detail & Related papers (2020-10-20T15:56:31Z) - NLNDE: Enhancing Neural Sequence Taggers with Attention and Noisy
Channel for Robust Pharmacological Entity Detection [11.98821166621488]
We describe the system with which we participated in the first subtrack of the PharmaCoNER competition of the BioNLP Open Shared Tasks 2019.
Our system achieves promising results, especially by combining the different techniques, and reaches up to 88.6% F1 in the competition.
arXiv Detail & Related papers (2020-07-02T11:17:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.