A multi-perspective combined recall and rank framework for Chinese
procedure terminology normalization
- URL: http://arxiv.org/abs/2101.09101v1
- Date: Fri, 22 Jan 2021 13:37:10 GMT
- Title: A multi-perspective combined recall and rank framework for Chinese
procedure terminology normalization
- Authors: Ming Liang and Kui Xue and Tong Ruan
- Abstract summary: In this paper, we focus on Chinese procedure terminology normalization.
The expression of terminologies are various and one medical mention may be linked to multiple terminologies.
We propose a combined recall and rank framework to solve the above problems.
- Score: 11.371582109211815
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Medical terminology normalization aims to map the clinical mention to
terminologies come from a knowledge base, which plays an important role in
analyzing Electronic Health Record(EHR) and many downstream tasks. In this
paper, we focus on Chinese procedure terminology normalization. The expression
of terminologies are various and one medical mention may be linked to multiple
terminologies. Previous study explores some methods such as multi-class
classification or learning to rank(LTR) to sort the terminologies by literature
and semantic information. However, these information is inadequate to find the
right terminologies, particularly in multi-implication cases. In this work, we
propose a combined recall and rank framework to solve the above problems. This
framework is composed of a multi-task candidate generator(MTCG), a keywords
attentive ranker(KAR) and a fusion block(FB). MTCG is utilized to predict the
mention implication number and recall candidates with semantic similarity. KAR
is based on Bert with a keywords attentive mechanism which focuses on keywords
such as procedure sites and procedure types. FB merges the similarity come from
MTCG and KAR to sort the terminologies from different perspectives. Detailed
experimental analysis shows our proposed framework has a remarkable improvement
on both performance and efficiency.
Related papers
- A General and Flexible Multi-concept Parsing Framework for Multilingual Semantic Matching [60.51839859852572]
We propose to resolve the text into multi concepts for multilingual semantic matching to liberate the model from the reliance on NER models.
We conduct comprehensive experiments on English datasets QQP and MRPC, and Chinese dataset Medical-SM.
arXiv Detail & Related papers (2024-03-05T13:55:16Z) - MapperGPT: Large Language Models for Linking and Mapping Entities [1.5340902251924438]
We present MapperGPT, an approach that uses Large Language Models to review and refine mapping as a post-processing step.
We show that when used in combination with high-recall methods, MapperGPT can provide a substantial improvement in accuracy.
arXiv Detail & Related papers (2023-10-05T16:43:04Z) - Biomedical Named Entity Recognition via Dictionary-based Synonym
Generalization [51.89486520806639]
We propose a novel Synonym Generalization (SynGen) framework that recognizes the biomedical concepts contained in the input text using span-based predictions.
We extensively evaluate our approach on a wide range of benchmarks and the results verify that SynGen outperforms previous dictionary-based models by notable margins.
arXiv Detail & Related papers (2023-05-22T14:36:32Z) - Token Classification for Disambiguating Medical Abbreviations [0.0]
Abbreviations are unavoidable yet critical parts of the medical text.
Lack of a standardized mapping system makes disambiguating abbreviations a difficult and time-consuming task.
arXiv Detail & Related papers (2022-10-05T18:06:49Z) - Automatic Biomedical Term Clustering by Learning Fine-grained Term
Representations [0.8154691566915505]
State-of-the-art term embeddings leverage pretrained language models to encode terms and use synonyms and relation knowledge from knowledge graphs to guide contrastive learning.
These embeddings are not sensitive to minor textual differences which leads to failure for biomedical term clustering.
To alleviate this problem, we adjust the sampling strategy in pretraining term embeddings by providing dynamic hard positive and negative samples.
We name our proposed method as CODER++, and it has been applied in clustering biomedical concepts in the newly released Biomedical Knowledge Graph named BIOS.
arXiv Detail & Related papers (2022-04-01T12:30:58Z) - Semantic Search for Large Scale Clinical Ontologies [63.71950996116403]
We present a deep learning approach to build a search system for large clinical vocabularies.
We propose a Triplet-BERT model and a method that generates training data based on semantic training data.
The model is evaluated using five real benchmark data sets and the results show that our approach achieves high results on both free text to concept and concept to searching concept vocabularies.
arXiv Detail & Related papers (2022-01-01T05:15:42Z) - Clinical Named Entity Recognition using Contextualized Token
Representations [49.036805795072645]
This paper introduces the technique of contextualized word embedding to better capture the semantic meaning of each word based on its context.
We pre-train two deep contextualized language models, Clinical Embeddings from Language Model (C-ELMo) and Clinical Contextual String Embeddings (C-Flair)
Explicit experiments show that our models gain dramatic improvements compared to both static word embeddings and domain-generic language models.
arXiv Detail & Related papers (2021-06-23T18:12:58Z) - Does the Magic of BERT Apply to Medical Code Assignment? A Quantitative
Study [2.871614744079523]
It is not clear if pretrained models are useful for medical code prediction without further architecture engineering.
We propose a hierarchical fine-tuning architecture to capture interactions between distant words and adopt label-wise attention to exploit label information.
Contrary to current trends, we demonstrate that a carefully trained classical CNN outperforms attention-based models on a MIMIC-III subset with frequent codes.
arXiv Detail & Related papers (2021-03-11T07:23:45Z) - UmlsBERT: Clinical Domain Knowledge Augmentation of Contextual
Embeddings Using the Unified Medical Language System Metathesaurus [73.86656026386038]
We introduce UmlsBERT, a contextual embedding model that integrates domain knowledge during the pre-training process.
By applying these two strategies, UmlsBERT can encode clinical domain knowledge into word embeddings and outperform existing domain-specific models.
arXiv Detail & Related papers (2020-10-20T15:56:31Z) - IMRAM: Iterative Matching with Recurrent Attention Memory for
Cross-Modal Image-Text Retrieval [105.77562776008459]
Existing methods leverage the attention mechanism to explore such correspondence in a fine-grained manner.
It may be difficult to optimally capture such sophisticated correspondences in existing methods.
We propose an Iterative Matching with Recurrent Attention Memory (IMRAM) method, in which correspondences are captured with multiple steps of alignments.
arXiv Detail & Related papers (2020-03-08T12:24:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.