Refinement of an Epilepsy Dictionary through Human Annotation of Health-related posts on Instagram
- URL: http://arxiv.org/abs/2405.08784v1
- Date: Tue, 14 May 2024 17:27:59 GMT
- Title: Refinement of an Epilepsy Dictionary through Human Annotation of Health-related posts on Instagram
- Authors: Aehong Min, Xuan Wang, Rion Brattig Correia, Jordan Rozum, Wendy R. Miller, Luis M. Rocha,
- Abstract summary: We used a dictionary built from biomedical terminology to tag more than 8 million Instagram posts by users who have mentioned an epilepsy-relevant drug at least once.
A random sample of 1,771 posts with 2,947 term matches was evaluated by human annotators to identify false-positives.
OpenAI's GPT series models were compared against human annotation.
- Score: 5.410785987233275
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We used a dictionary built from biomedical terminology extracted from various sources such as DrugBank, MedDRA, MedlinePlus, TCMGeneDIT, to tag more than 8 million Instagram posts by users who have mentioned an epilepsy-relevant drug at least once, between 2010 and early 2016. A random sample of 1,771 posts with 2,947 term matches was evaluated by human annotators to identify false-positives. OpenAI's GPT series models were compared against human annotation. Frequent terms with a high false-positive rate were removed from the dictionary. Analysis of the estimated false-positive rates of the annotated terms revealed 8 ambiguous terms (plus synonyms) used in Instagram posts, which were removed from the original dictionary. To study the effect of removing those terms, we constructed knowledge networks using the refined and the original dictionaries and performed an eigenvector-centrality analysis on both networks. We show that the refined dictionary thus produced leads to a significantly different rank of important terms, as measured by their eigenvector-centrality of the knowledge networks. Furthermore, the most important terms obtained after refinement are of greater medical relevance. In addition, we show that OpenAI's GPT series models fare worse than human annotators in this task.
Related papers
- Incorporating Dictionaries into a Neural Network Architecture to Extract
COVID-19 Medical Concepts From Social Media [0.2302001830524133]
We investigate the potential benefit of incorporating dictionary information into a neural network architecture for natural language processing.
In particular, we make use of this architecture to extract several concepts related to COVID-19 from an on-line medical forum.
Our results show that incorporating small domain dictionaries to deep learning models can improve concept extraction tasks.
arXiv Detail & Related papers (2023-09-05T12:47:44Z) - Biomedical Named Entity Recognition via Dictionary-based Synonym
Generalization [51.89486520806639]
We propose a novel Synonym Generalization (SynGen) framework that recognizes the biomedical concepts contained in the input text using span-based predictions.
We extensively evaluate our approach on a wide range of benchmarks and the results verify that SynGen outperforms previous dictionary-based models by notable margins.
arXiv Detail & Related papers (2023-05-22T14:36:32Z) - DICTDIS: Dictionary Constrained Disambiguation for Improved NMT [50.888881348723295]
We present DictDis, a lexically constrained NMT system that disambiguates between multiple candidate translations derived from dictionaries.
We demonstrate the utility of DictDis via extensive experiments on English-Hindi and English-German sentences in a variety of domains including regulatory, finance, engineering.
arXiv Detail & Related papers (2022-10-13T13:04:16Z) - Always Keep your Target in Mind: Studying Semantics and Improving
Performance of Neural Lexical Substitution [124.99894592871385]
We present a large-scale comparative study of lexical substitution methods employing both old and most recent language models.
We show that already competitive results achieved by SOTA LMs/MLMs can be further substantially improved if information about the target word is injected properly.
arXiv Detail & Related papers (2022-06-07T16:16:19Z) - Semantic-Preserving Adversarial Text Attacks [85.32186121859321]
We propose a Bigram and Unigram based adaptive Semantic Preservation Optimization (BU-SPO) method to examine the vulnerability of deep models.
Our method achieves the highest attack success rates and semantics rates by changing the smallest number of words compared with existing methods.
arXiv Detail & Related papers (2021-08-23T09:05:18Z) - Clinical Named Entity Recognition using Contextualized Token
Representations [49.036805795072645]
This paper introduces the technique of contextualized word embedding to better capture the semantic meaning of each word based on its context.
We pre-train two deep contextualized language models, Clinical Embeddings from Language Model (C-ELMo) and Clinical Contextual String Embeddings (C-Flair)
Explicit experiments show that our models gain dramatic improvements compared to both static word embeddings and domain-generic language models.
arXiv Detail & Related papers (2021-06-23T18:12:58Z) - An Automated Method to Enrich Consumer Health Vocabularies Using GloVe
Word Embeddings and An Auxiliary Lexical Resource [0.0]
A layman may have difficulty communicating with a professional due to not understanding specialized terms common to the domain.
Several professional vocabularies have been created to map laymen medical terms to professional medical terms and vice versa.
We present an automatic method to enrich laymen's vocabularies that has the benefit of being able to be applied to vocabularies in any domain.
arXiv Detail & Related papers (2021-05-18T20:16:45Z) - BBAEG: Towards BERT-based Biomedical Adversarial Example Generation for
Text Classification [1.14219428942199]
We propose BBAEG (Biomedical BERT-based Adversarial Example Generation), a black-box attack algorithm for biomedical text classification.
We demonstrate that BBAEG performs stronger attack with better language fluency, semantic coherence as compared to prior work.
arXiv Detail & Related papers (2021-04-05T05:32:56Z) - Speakers Fill Lexical Semantic Gaps with Context [65.08205006886591]
We operationalise the lexical ambiguity of a word as the entropy of meanings it can take.
We find significant correlations between our estimate of ambiguity and the number of synonyms a word has in WordNet.
This suggests that, in the presence of ambiguity, speakers compensate by making contexts more informative.
arXiv Detail & Related papers (2020-10-05T17:19:10Z) - Can Embeddings Adequately Represent Medical Terminology? New Large-Scale
Medical Term Similarity Datasets Have the Answer! [13.885093944392464]
A large number of embeddings trained on medical data have emerged, but it remains unclear how well they represent medical terminology.
We present multiple automatically created large-scale medical term similarity datasets.
We evaluate state-of-the-art word and contextual embeddings on our new datasets, comparing multiple vector similarity metrics and word vector aggregation techniques.
arXiv Detail & Related papers (2020-03-24T19:18:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.