HealthE: Classifying Entities in Online Textual Health Advice
- URL: http://arxiv.org/abs/2210.03246v1
- Date: Thu, 6 Oct 2022 23:18:24 GMT
- Title: HealthE: Classifying Entities in Online Textual Health Advice
- Authors: Joseph Gatto, Parker Seegmiller, Garrett Johnston, Sarah M. Preum
- Abstract summary: We release a new annotated dataset, HealthE, consisting of 6,756 health advice.
HealthE has a more granular label space compared to existing medical NER corpora.
We introduce a new health entity classification model, EP S-BERT, which leverages textual context patterns in the classification of entity classes.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The processing of entities in natural language is essential to many medical
NLP systems. Unfortunately, existing datasets vastly under-represent the
entities required to model public health relevant texts such as health advice
often found on sites like WebMD. People rely on such information for personal
health management and clinically relevant decision making. In this work, we
release a new annotated dataset, HealthE, consisting of 6,756 health advice.
HealthE has a more granular label space compared to existing medical NER
corpora and contains annotation for diverse health phrases. Additionally, we
introduce a new health entity classification model, EP S-BERT, which leverages
textual context patterns in the classification of entity classes. EP S-BERT
provides a 4-point increase in F1 score over the nearest baseline and a
34-point increase in F1 when compared to off-the-shelf medical NER tools
trained to extract disease and medication mentions from clinical texts. All
code and data are publicly available on Github.
Related papers
- Towards Evaluating and Building Versatile Large Language Models for Medicine [57.49547766838095]
We present MedS-Bench, a benchmark designed to evaluate the performance of large language models (LLMs) in clinical contexts.
MedS-Bench spans 11 high-level clinical tasks, including clinical report summarization, treatment recommendations, diagnosis, named entity recognition, and medical concept explanation.
MedS-Ins comprises 58 medically oriented language corpora, totaling 13.5 million samples across 122 tasks.
arXiv Detail & Related papers (2024-08-22T17:01:34Z) - FEDMEKI: A Benchmark for Scaling Medical Foundation Models via Federated Knowledge Injection [83.54960238236548]
FEDMEKI not only preserves data privacy but also enhances the capability of medical foundation models.
FEDMEKI allows medical foundation models to learn from a broader spectrum of medical knowledge without direct data exposure.
arXiv Detail & Related papers (2024-08-17T15:18:56Z) - SNOBERT: A Benchmark for clinical notes entity linking in the SNOMED CT clinical terminology [43.89160296332471]
We propose a method for linking text spans in clinical notes to specific concepts in the SNOMED CT using BERT-based models.
The method consists of two stages: candidate selection and candidate matching. The models were trained on one of the largest publicly available dataset of labeled clinical notes.
arXiv Detail & Related papers (2024-05-25T08:00:44Z) - Diversifying Knowledge Enhancement of Biomedical Language Models using
Adapter Modules and Knowledge Graphs [54.223394825528665]
We develop an approach that uses lightweight adapter modules to inject structured biomedical knowledge into pre-trained language models.
We use two large KGs, the biomedical knowledge system UMLS and the novel biochemical OntoChem, with two prominent biomedical PLMs, PubMedBERT and BioLinkBERT.
We show that our methodology leads to performance improvements in several instances while keeping requirements in computing power low.
arXiv Detail & Related papers (2023-12-21T14:26:57Z) - An Analysis on Large Language Models in Healthcare: A Case Study of
BioBERT [0.0]
This paper conducts a comprehensive investigation into applying large language models, particularly on BioBERT, in healthcare.
The analysis outlines a systematic methodology for fine-tuning BioBERT to meet the unique needs of the healthcare domain.
The paper thoroughly examines ethical considerations, particularly patient privacy and data security.
arXiv Detail & Related papers (2023-10-11T08:16:35Z) - Toward Improving Health Literacy in Patient Education Materials with
Neural Machine Translation Models [4.402142901902557]
People with low health literacy usually have trouble understanding health information, following post-visit instructions, and using prescriptions.
We propose to leverage natural language processing techniques to improve health literacy in patient education materials by automatically translating illiterate languages in a given sentence.
We trained and tested the state-of-the-art neural machine translation (NMT) models on a silver standard training dataset and a gold standard testing dataset.
arXiv Detail & Related papers (2022-09-14T15:30:54Z) - New Arabic Medical Dataset for Diseases Classification [55.41644538483948]
We introduce a new Arab medical dataset, which includes two thousand medical documents collected from several Arabic medical websites.
The dataset was built for the task of classifying texts and includes 10 classes (Blood, Bone, Cardiovascular, Ear, Endocrine, Eye, Gastrointestinal, Immune, Liver and Nephrological)
Experiments on the dataset were performed by fine-tuning three pre-trained models: BERT from Google, Arabert that based on BERT with large Arabic corpus, and AraBioNER that based on Arabert with Arabic medical corpus.
arXiv Detail & Related papers (2021-06-29T10:42:53Z) - Word-level Text Highlighting of Medical Texts forTelehealth Services [0.0]
This paper aims to show how different text highlighting techniques can capture relevant medical context.
Three different word-level text highlighting methodologies are implemented and evaluated.
The results of our experiments show that the neural network approach is successful in highlighting medically-relevant terms.
arXiv Detail & Related papers (2021-05-21T15:13:54Z) - MedDG: An Entity-Centric Medical Consultation Dataset for Entity-Aware
Medical Dialogue Generation [86.38736781043109]
We build and release a large-scale high-quality Medical Dialogue dataset related to 12 types of common Gastrointestinal diseases named MedDG.
We propose two kinds of medical dialogue tasks based on MedDG dataset. One is the next entity prediction and the other is the doctor response generation.
Experimental results show that the pre-train language models and other baselines struggle on both tasks with poor performance in our dataset.
arXiv Detail & Related papers (2020-10-15T03:34:33Z) - COMETA: A Corpus for Medical Entity Linking in the Social Media [27.13349965075764]
We introduce a new corpus called COMETA, consisting of 20k English biomedical entity mentions from Reddit expert-annotated with links to SNOMED CT.
Our corpus satisfies a combination of desirable properties, from scale and coverage to diversity and quality.
We shed light on the ability of these systems to perform complex inference on entities and concepts under 2 challenging evaluation scenarios.
arXiv Detail & Related papers (2020-10-07T09:16:45Z) - Assessing the Severity of Health States based on Social Media Posts [62.52087340582502]
We propose a multiview learning framework that models both the textual content as well as contextual-information to assess the severity of the user's health state.
The diverse NLU views demonstrate its effectiveness on both the tasks and as well as on the individual disease to assess a user's health.
arXiv Detail & Related papers (2020-09-21T03:45:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.