Biomedical Named Entity Recognition via Dictionary-based Synonym
Generalization
- URL: http://arxiv.org/abs/2305.13066v2
- Date: Fri, 13 Oct 2023 11:19:19 GMT
- Title: Biomedical Named Entity Recognition via Dictionary-based Synonym
Generalization
- Authors: Zihao Fu, Yixuan Su, Zaiqiao Meng, Nigel Collier
- Abstract summary: We propose a novel Synonym Generalization (SynGen) framework that recognizes the biomedical concepts contained in the input text using span-based predictions.
We extensively evaluate our approach on a wide range of benchmarks and the results verify that SynGen outperforms previous dictionary-based models by notable margins.
- Score: 51.89486520806639
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Biomedical named entity recognition is one of the core tasks in biomedical
natural language processing (BioNLP). To tackle this task, numerous
supervised/distantly supervised approaches have been proposed. Despite their
remarkable success, these approaches inescapably demand laborious human effort.
To alleviate the need of human effort, dictionary-based approaches have been
proposed to extract named entities simply based on a given dictionary. However,
one downside of existing dictionary-based approaches is that they are
challenged to identify concept synonyms that are not listed in the given
dictionary, which we refer as the synonym generalization problem. In this
study, we propose a novel Synonym Generalization (SynGen) framework that
recognizes the biomedical concepts contained in the input text using span-based
predictions. In particular, SynGen introduces two regularization terms, namely,
(1) a synonym distance regularizer; and (2) a noise perturbation regularizer,
to minimize the synonym generalization error. To demonstrate the effectiveness
of our approach, we provide a theoretical analysis of the bound of synonym
generalization error. We extensively evaluate our approach on a wide range of
benchmarks and the results verify that SynGen outperforms previous
dictionary-based models by notable margins. Lastly, we provide a detailed
analysis to further reveal the merits and inner-workings of our approach.
Related papers
- Bi-Encoders based Species Normalization -- Pairwise Sentence Learning to
Rank [0.0]
We present a novel deep learning approach for named entity normalization, treating it as a pair-wise learning to rank problem.
We conduct experiments on species entity types and evaluate our method against state-of-the-art techniques.
arXiv Detail & Related papers (2023-10-22T17:30:16Z) - Unsupervised Syntactically Controlled Paraphrase Generation with
Abstract Meaning Representations [59.10748929158525]
Abstract Representations (AMR) can greatly improve the performance of unsupervised syntactically controlled paraphrase generation.
Our proposed model, AMR-enhanced Paraphrase Generator (AMRPG), encodes the AMR graph and the constituency parses the input sentence into two disentangled semantic and syntactic embeddings.
Experiments show that AMRPG generates more accurate syntactically controlled paraphrases, both quantitatively and qualitatively, compared to the existing unsupervised approaches.
arXiv Detail & Related papers (2022-11-02T04:58:38Z) - Keywords and Instances: A Hierarchical Contrastive Learning Framework
Unifying Hybrid Granularities for Text Generation [59.01297461453444]
We propose a hierarchical contrastive learning mechanism, which can unify hybrid granularities semantic meaning in the input text.
Experiments demonstrate that our model outperforms competitive baselines on paraphrasing, dialogue generation, and storytelling tasks.
arXiv Detail & Related papers (2022-05-26T13:26:03Z) - Generative Biomedical Entity Linking via Knowledge Base-Guided
Pre-training and Synonyms-Aware Fine-tuning [0.8154691566915505]
We propose a generative approach to model biomedical entity linking (EL)
We propose KB-guided pre-training by constructing synthetic samples with synonyms and definitions from KB.
We also propose synonyms-aware fine-tuning to select concept names for training, and propose decoder prompt and multi-synonyms constrained prefix tree for inference.
arXiv Detail & Related papers (2022-04-11T14:50:51Z) - Semantic Search for Large Scale Clinical Ontologies [63.71950996116403]
We present a deep learning approach to build a search system for large clinical vocabularies.
We propose a Triplet-BERT model and a method that generates training data based on semantic training data.
The model is evaluated using five real benchmark data sets and the results show that our approach achieves high results on both free text to concept and concept to searching concept vocabularies.
arXiv Detail & Related papers (2022-01-01T05:15:42Z) - Semantic-Preserving Adversarial Text Attacks [85.32186121859321]
We propose a Bigram and Unigram based adaptive Semantic Preservation Optimization (BU-SPO) method to examine the vulnerability of deep models.
Our method achieves the highest attack success rates and semantics rates by changing the smallest number of words compared with existing methods.
arXiv Detail & Related papers (2021-08-23T09:05:18Z) - End-to-end Biomedical Entity Linking with Span-based Dictionary Matching [5.273138059454523]
Disease name recognition and normalization is a fundamental process in biomedical text mining.
This study introduces a novel end-to-end approach that combines span representations with dictionary-matching features.
Our model handles unseen concepts by referring to a dictionary while maintaining the performance of neural network-based models.
arXiv Detail & Related papers (2021-04-21T12:24:12Z) - BBAEG: Towards BERT-based Biomedical Adversarial Example Generation for
Text Classification [1.14219428942199]
We propose BBAEG (Biomedical BERT-based Adversarial Example Generation), a black-box attack algorithm for biomedical text classification.
We demonstrate that BBAEG performs stronger attack with better language fluency, semantic coherence as compared to prior work.
arXiv Detail & Related papers (2021-04-05T05:32:56Z) - PhenoTagger: A Hybrid Method for Phenotype Concept Recognition using
Human Phenotype Ontology [6.165755812152143]
PhenoTagger is a hybrid method that combines both dictionary and machine learning-based methods to recognize concepts in unstructured text.
Our method is validated with two HPO corpora, and the results show that PhenoTagger compares favorably to previous methods.
arXiv Detail & Related papers (2020-09-17T18:00:43Z) - Detecting and Understanding Generalization Barriers for Neural Machine
Translation [53.23463279153577]
This paper attempts to identify and understand generalization barrier words within an unseen input sentence.
We propose a principled definition of generalization barrier words and a modified version which is tractable in computation.
We then conduct extensive analyses on those detected generalization barrier words on both Zh$Leftrightarrow$En NIST benchmarks.
arXiv Detail & Related papers (2020-04-05T12:33:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.