An Automated Method to Enrich Consumer Health Vocabularies Using GloVe
Word Embeddings and An Auxiliary Lexical Resource
- URL: http://arxiv.org/abs/2105.08812v1
- Date: Tue, 18 May 2021 20:16:45 GMT
- Title: An Automated Method to Enrich Consumer Health Vocabularies Using GloVe
Word Embeddings and An Auxiliary Lexical Resource
- Authors: Mohammed Ibrahim, Susan Gauch, Omar Salman, Mohammed Alqahatani
- Abstract summary: A layman may have difficulty communicating with a professional due to not understanding specialized terms common to the domain.
Several professional vocabularies have been created to map laymen medical terms to professional medical terms and vice versa.
We present an automatic method to enrich laymen's vocabularies that has the benefit of being able to be applied to vocabularies in any domain.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Background: Clear language makes communication easier between any two
parties. A layman may have difficulty communicating with a professional due to
not understanding the specialized terms common to the domain. In healthcare, it
is rare to find a layman knowledgeable in medical terminology which can lead to
poor understanding of their condition and/or treatment. To bridge this gap,
several professional vocabularies and ontologies have been created to map
laymen medical terms to professional medical terms and vice versa.
Objective: Many of the presented vocabularies are built manually or
semi-automatically requiring large investments of time and human effort and
consequently the slow growth of these vocabularies. In this paper, we present
an automatic method to enrich laymen's vocabularies that has the benefit of
being able to be applied to vocabularies in any domain.
Methods: Our entirely automatic approach uses machine learning, specifically
Global Vectors for Word Embeddings (GloVe), on a corpus collected from a social
media healthcare platform to extend and enhance consumer health vocabularies
(CHV). Our approach further improves the CHV by incorporating synonyms and
hyponyms from the WordNet ontology. The basic GloVe and our novel algorithms
incorporating WordNet were evaluated using two laymen datasets from the
National Library of Medicine (NLM), Open-Access Consumer Health Vocabulary (OAC
CHV) and MedlinePlus Healthcare Vocabulary.
Results: The results show that GloVe was able to find new laymen terms with
an F-score of 48.44%. Furthermore, our enhanced GloVe approach outperformed
basic GloVe with an average F-score of 61%, a relative improvement of 25%.
Furthermore, the enhanced GloVe showed a statistical significance over the two
ground truth datasets with P<.001.
Related papers
- Evaluation of LLMs in Medical Text Summarization: The Role of Vocabulary Adaptation in High OOV Settings [26.442558912559658]
Large Language Models (LLMs) recently achieved great success in medical text summarization by simply using in-context learning.<n>We show that LLMs show a significant performance drop for data points with high concentration of out-of-vocabulary words or with high novelty.<n> Vocabulary adaptation is an intuitive solution to this vocabulary mismatch issue.
arXiv Detail & Related papers (2025-05-27T14:23:03Z) - Refinement of an Epilepsy Dictionary through Human Annotation of Health-related posts on Instagram [5.410785987233275]
We used a dictionary built from biomedical terminology to tag more than 8 million Instagram posts by users who have mentioned an epilepsy-relevant drug at least once.
A random sample of 1,771 posts with 2,947 term matches was evaluated by human annotators to identify false-positives.
OpenAI's GPT series models were compared against human annotation.
arXiv Detail & Related papers (2024-05-14T17:27:59Z) - Integrating LLM, EEG, and Eye-Tracking Biomarker Analysis for Word-Level
Neural State Classification in Semantic Inference Reading Comprehension [4.390968520425543]
This study aims to provide insights into individuals' neural states during a semantic relation reading-comprehension task.
We propose jointly analyzing LLMs, eye-gaze, and electroencephalographic (EEG) data to study how the brain processes words with varying degrees of relevance to a keyword during reading.
arXiv Detail & Related papers (2023-09-27T15:12:08Z) - Biomedical Named Entity Recognition via Dictionary-based Synonym
Generalization [51.89486520806639]
We propose a novel Synonym Generalization (SynGen) framework that recognizes the biomedical concepts contained in the input text using span-based predictions.
We extensively evaluate our approach on a wide range of benchmarks and the results verify that SynGen outperforms previous dictionary-based models by notable margins.
arXiv Detail & Related papers (2023-05-22T14:36:32Z) - MedJEx: A Medical Jargon Extraction Model with Wiki's Hyperlink Span and
Contextualized Masked Language Model Score [6.208127495081593]
We present a novel and publicly available dataset with expert-annotated medical jargon terms from 18K+ EHR note sentences.
We then introduce a novel medical jargon extraction ($MedJEx$) model which has been shown to outperform existing state-of-the-art NLP models.
arXiv Detail & Related papers (2022-10-12T02:27:32Z) - Always Keep your Target in Mind: Studying Semantics and Improving
Performance of Neural Lexical Substitution [124.99894592871385]
We present a large-scale comparative study of lexical substitution methods employing both old and most recent language models.
We show that already competitive results achieved by SOTA LMs/MLMs can be further substantially improved if information about the target word is injected properly.
arXiv Detail & Related papers (2022-06-07T16:16:19Z) - Short-Term Word-Learning in a Dynamically Changing Environment [63.025297637716534]
We show how to supplement an end-to-end ASR system with a word/phrase memory and a mechanism to access this memory to recognize the words and phrases correctly.
We demonstrate significant improvements in the detection rate of new words with only a minor increase in false alarms.
arXiv Detail & Related papers (2022-03-29T10:05:39Z) - Towards more patient friendly clinical notes through language models and
ontologies [57.51898902864543]
We present a novel approach to automated medical text based on word simplification and language modelling.
We use a new dataset pairs of publicly available medical sentences and a version of them simplified by clinicians.
Our method based on a language model trained on medical forum data generates simpler sentences while preserving both grammar and the original meaning.
arXiv Detail & Related papers (2021-12-23T16:11:19Z) - Open Vocabulary Electroencephalography-To-Text Decoding and Zero-shot
Sentiment Classification [78.120927891455]
State-of-the-art brain-to-text systems have achieved great success in decoding language directly from brain signals using neural networks.
In this paper, we extend the problem to open vocabulary Electroencephalography(EEG)-To-Text Sequence-To-Sequence decoding and zero-shot sentence sentiment classification on natural reading tasks.
Our model achieves a 40.1% BLEU-1 score on EEG-To-Text decoding and a 55.6% F1 score on zero-shot EEG-based ternary sentiment classification, which significantly outperforms supervised baselines.
arXiv Detail & Related papers (2021-12-05T21:57:22Z) - Clinical Named Entity Recognition using Contextualized Token
Representations [49.036805795072645]
This paper introduces the technique of contextualized word embedding to better capture the semantic meaning of each word based on its context.
We pre-train two deep contextualized language models, Clinical Embeddings from Language Model (C-ELMo) and Clinical Contextual String Embeddings (C-Flair)
Explicit experiments show that our models gain dramatic improvements compared to both static word embeddings and domain-generic language models.
arXiv Detail & Related papers (2021-06-23T18:12:58Z) - Integration of Domain Knowledge using Medical Knowledge Graph Deep
Learning for Cancer Phenotyping [6.077023952306772]
We propose a method to integrate external knowledge from medical terminology into the context captured by word embeddings.
We evaluate the proposed approach using a Multitask Convolutional Neural Network (MT-CNN) to extract six cancer characteristics from a dataset of 900K cancer pathology reports.
arXiv Detail & Related papers (2021-01-05T03:59:43Z) - Enriching Consumer Health Vocabulary Using Enhanced GloVe Word Embedding [0.0]
Open-Access and Collaborative Consumer Health Vocabulary (OAC CHV) is a collection of medical terms written in plain English.
The National Library of Medicine has integrated and mapped the CHV terms to their Unified Medical Language System (UMLS)
We present an enhanced word embedding technique that generates new CHV terms from a consumer-generated text.
arXiv Detail & Related papers (2020-03-31T22:50:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.