Enriching Consumer Health Vocabulary Using Enhanced GloVe Word Embedding
- URL: http://arxiv.org/abs/2004.00150v2
- Date: Mon, 13 Apr 2020 18:02:10 GMT
- Title: Enriching Consumer Health Vocabulary Using Enhanced GloVe Word Embedding
- Authors: Mohammed Ibrahim, Susan Gauch, Omar Salman, Mohammed Alqahatani
- Abstract summary: Open-Access and Collaborative Consumer Health Vocabulary (OAC CHV) is a collection of medical terms written in plain English.
The National Library of Medicine has integrated and mapped the CHV terms to their Unified Medical Language System (UMLS)
We present an enhanced word embedding technique that generates new CHV terms from a consumer-generated text.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Open-Access and Collaborative Consumer Health Vocabulary (OAC CHV, or CHV for
short), is a collection of medical terms written in plain English. It provides
a list of simple, easy, and clear terms that laymen prefer to use rather than
an equivalent professional medical term. The National Library of Medicine (NLM)
has integrated and mapped the CHV terms to their Unified Medical Language
System (UMLS). These CHV terms mapped to 56000 professional concepts on the
UMLS. We found that about 48% of these laymen's terms are still jargon and
matched with the professional terms on the UMLS. In this paper, we present an
enhanced word embedding technique that generates new CHV terms from a
consumer-generated text. We downloaded our corpus from a healthcare social
media and evaluated our new method based on iterative feedback to word
embedding using ground truth built from the existing CHV terms. Our feedback
algorithm outperformed unmodified GLoVe and new CHV terms have been detected.
Related papers
- Evaluation of LLMs in Medical Text Summarization: The Role of Vocabulary Adaptation in High OOV Settings [26.442558912559658]
Large Language Models (LLMs) recently achieved great success in medical text summarization by simply using in-context learning.<n>We show that LLMs show a significant performance drop for data points with high concentration of out-of-vocabulary words or with high novelty.<n> Vocabulary adaptation is an intuitive solution to this vocabulary mismatch issue.
arXiv Detail & Related papers (2025-05-27T14:23:03Z) - Extracting domain-specific terms using contextual word embeddings [2.7941582470640784]
This paper proposes a novel machine learning approach to terminology extraction.
It combines features from traditional term extraction systems with novel contextual features derived from contextual word embeddings.
Our approach provides significant improvements in terms of F1 score over the previous state-of-the-art.
arXiv Detail & Related papers (2025-02-24T16:06:35Z) - Biomedical Named Entity Recognition via Dictionary-based Synonym
Generalization [51.89486520806639]
We propose a novel Synonym Generalization (SynGen) framework that recognizes the biomedical concepts contained in the input text using span-based predictions.
We extensively evaluate our approach on a wide range of benchmarks and the results verify that SynGen outperforms previous dictionary-based models by notable margins.
arXiv Detail & Related papers (2023-05-22T14:36:32Z) - MedJEx: A Medical Jargon Extraction Model with Wiki's Hyperlink Span and
Contextualized Masked Language Model Score [6.208127495081593]
We present a novel and publicly available dataset with expert-annotated medical jargon terms from 18K+ EHR note sentences.
We then introduce a novel medical jargon extraction ($MedJEx$) model which has been shown to outperform existing state-of-the-art NLP models.
arXiv Detail & Related papers (2022-10-12T02:27:32Z) - Constructing Cross-lingual Consumer Health Vocabulary with Word-Embedding from Comparable User Generated Content [2.4316589174722485]
The open-access and collaborative consumer health vocabulary (OAC CHV) is the controlled vocabulary for addressing such a challenge.
This research proposes a cross-lingual automatic term recognition framework for extending the English CHV into a cross-lingual one.
arXiv Detail & Related papers (2022-06-23T10:46:39Z) - Always Keep your Target in Mind: Studying Semantics and Improving
Performance of Neural Lexical Substitution [124.99894592871385]
We present a large-scale comparative study of lexical substitution methods employing both old and most recent language models.
We show that already competitive results achieved by SOTA LMs/MLMs can be further substantially improved if information about the target word is injected properly.
arXiv Detail & Related papers (2022-06-07T16:16:19Z) - Towards more patient friendly clinical notes through language models and
ontologies [57.51898902864543]
We present a novel approach to automated medical text based on word simplification and language modelling.
We use a new dataset pairs of publicly available medical sentences and a version of them simplified by clinicians.
Our method based on a language model trained on medical forum data generates simpler sentences while preserving both grammar and the original meaning.
arXiv Detail & Related papers (2021-12-23T16:11:19Z) - Clinical Named Entity Recognition using Contextualized Token
Representations [49.036805795072645]
This paper introduces the technique of contextualized word embedding to better capture the semantic meaning of each word based on its context.
We pre-train two deep contextualized language models, Clinical Embeddings from Language Model (C-ELMo) and Clinical Contextual String Embeddings (C-Flair)
Explicit experiments show that our models gain dramatic improvements compared to both static word embeddings and domain-generic language models.
arXiv Detail & Related papers (2021-06-23T18:12:58Z) - An Automated Method to Enrich Consumer Health Vocabularies Using GloVe
Word Embeddings and An Auxiliary Lexical Resource [0.0]
A layman may have difficulty communicating with a professional due to not understanding specialized terms common to the domain.
Several professional vocabularies have been created to map laymen medical terms to professional medical terms and vice versa.
We present an automatic method to enrich laymen's vocabularies that has the benefit of being able to be applied to vocabularies in any domain.
arXiv Detail & Related papers (2021-05-18T20:16:45Z) - UmlsBERT: Clinical Domain Knowledge Augmentation of Contextual
Embeddings Using the Unified Medical Language System Metathesaurus [73.86656026386038]
We introduce UmlsBERT, a contextual embedding model that integrates domain knowledge during the pre-training process.
By applying these two strategies, UmlsBERT can encode clinical domain knowledge into word embeddings and outperform existing domain-specific models.
arXiv Detail & Related papers (2020-10-20T15:56:31Z) - Can Embeddings Adequately Represent Medical Terminology? New Large-Scale
Medical Term Similarity Datasets Have the Answer! [13.885093944392464]
A large number of embeddings trained on medical data have emerged, but it remains unclear how well they represent medical terminology.
We present multiple automatically created large-scale medical term similarity datasets.
We evaluate state-of-the-art word and contextual embeddings on our new datasets, comparing multiple vector similarity metrics and word vector aggregation techniques.
arXiv Detail & Related papers (2020-03-24T19:18:34Z) - Learning Contextualized Document Representations for Healthcare Answer
Retrieval [68.02029435111193]
Contextual Discourse Vectors (CDV) is a distributed document representation for efficient answer retrieval from long documents.
Our model leverages a dual encoder architecture with hierarchical LSTM layers and multi-task training to encode the position of clinical entities and aspects alongside the document discourse.
We show that our generalized model significantly outperforms several state-of-the-art baselines for healthcare passage ranking.
arXiv Detail & Related papers (2020-02-03T15:47:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.