Medical Concept Normalization in a Low-Resource Setting
- URL: http://arxiv.org/abs/2409.14579v1
- Date: Fri, 6 Sep 2024 10:19:32 GMT
- Title: Medical Concept Normalization in a Low-Resource Setting
- Authors: Tim Patzelt,
- Abstract summary: I investigate the challenges of medical concept normalization in a low-resource setting.
A dataset consisting of posts from a German medical online forum is annotated with concepts from the Unified Medical Language System.
Experiments demonstrate that multilingual Transformer-based models are able to outperform string similarity methods.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the field of biomedical natural language processing, medical concept normalization is a crucial task for accurately mapping mentions of concepts to a large knowledge base. However, this task becomes even more challenging in low-resource settings, where limited data and resources are available. In this thesis, I explore the challenges of medical concept normalization in a low-resource setting. Specifically, I investigate the shortcomings of current medical concept normalization methods applied to German lay texts. Since there is no suitable dataset available, a dataset consisting of posts from a German medical online forum is annotated with concepts from the Unified Medical Language System. The experiments demonstrate that multilingual Transformer-based models are able to outperform string similarity methods. The use of contextual information to improve the normalization of lay mentions is also examined, but led to inferior results. Based on the results of the best performing model, I present a systematic error analysis and lay out potential improvements to mitigate frequent errors.
Related papers
- A Textbook Remedy for Domain Shifts: Knowledge Priors for Medical Image Analysis [48.84443450990355]
Deep networks have achieved broad success in analyzing natural images, when applied to medical scans, they often fail in unexcepted situations.
We investigate this challenge and focus on model sensitivity to domain shifts, such as data sampled from different hospitals or data confounded by demographic variables such as sex, race, etc, in the context of chest X-rays and skin lesion images.
Taking inspiration from medical training, we propose giving deep networks a prior grounded in explicit medical knowledge communicated in natural language.
arXiv Detail & Related papers (2024-05-23T17:55:02Z) - Medical Vision-Language Pre-Training for Brain Abnormalities [96.1408455065347]
We show how to automatically collect medical image-text aligned data for pretraining from public resources such as PubMed.
In particular, we present a pipeline that streamlines the pre-training process by initially collecting a large brain image-text dataset.
We also investigate the unique challenge of mapping subfigures to subcaptions in the medical domain.
arXiv Detail & Related papers (2024-04-27T05:03:42Z) - Semantic Textual Similarity Assessment in Chest X-ray Reports Using a
Domain-Specific Cosine-Based Metric [1.7802147489386628]
We introduce a novel approach designed specifically for assessing the semantic similarity between generated medical reports and the ground truth.
Our approach is validated, demonstrating its efficiency in assessing domain-specific semantic similarity within medical contexts.
arXiv Detail & Related papers (2024-02-19T07:48:25Z) - Combining Contrastive Learning and Knowledge Graph Embeddings to develop
medical word embeddings for the Italian language [0.0]
This paper attempts to improve available embeddings in the uncovered niche of the Italian medical domain.
The main objective is to improve the accuracy of semantic similarity between medical terms.
Since the Italian language lacks medical texts and controlled vocabularies, we have developed a specific solution.
arXiv Detail & Related papers (2022-11-09T17:12:28Z) - RuMedBench: A Russian Medical Language Understanding Benchmark [58.99199480170909]
The paper describes the open Russian medical language understanding benchmark covering several task types.
We prepare the unified format labeling, data split, and evaluation metrics for new tasks.
A single-number metric expresses a model's ability to cope with the benchmark.
arXiv Detail & Related papers (2022-01-17T16:23:33Z) - Towards more patient friendly clinical notes through language models and
ontologies [57.51898902864543]
We present a novel approach to automated medical text based on word simplification and language modelling.
We use a new dataset pairs of publicly available medical sentences and a version of them simplified by clinicians.
Our method based on a language model trained on medical forum data generates simpler sentences while preserving both grammar and the original meaning.
arXiv Detail & Related papers (2021-12-23T16:11:19Z) - CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark [51.38557174322772]
We present the first Chinese Biomedical Language Understanding Evaluation benchmark.
It is a collection of natural language understanding tasks including named entity recognition, information extraction, clinical diagnosis normalization, single-sentence/sentence-pair classification.
We report empirical results with the current 11 pre-trained Chinese models, and experimental results show that state-of-the-art neural models perform by far worse than the human ceiling.
arXiv Detail & Related papers (2021-06-15T12:25:30Z) - Paragraph-level Simplification of Medical Texts [35.650619024498425]
Manual simplification does not scale to the rapidly growing body of biomedical literature.
We introduce a new corpus of parallel texts in English comprising technical and lay summaries of all published evidence pertaining to different clinical topics.
We propose a new metric based on likelihood scores from a masked language model pretrained on scientific texts.
arXiv Detail & Related papers (2021-04-12T18:56:05Z) - Unifying Relational Sentence Generation and Retrieval for Medical Image
Report Composition [142.42920413017163]
Current methods often generate the most common sentences due to dataset bias for individual case.
We propose a novel framework that unifies template retrieval and sentence generation to handle both common and rare abnormality.
arXiv Detail & Related papers (2021-01-09T04:33:27Z) - MeDAL: Medical Abbreviation Disambiguation Dataset for Natural Language
Understanding Pretraining [5.807159674193696]
We present MeDAL, a large medical text dataset curated for abbreviation disambiguation.
We pre-trained several models of common architectures on this dataset and empirically showed that such pre-training leads to improved performance and convergence speed when fine-tuning on downstream medical tasks.
arXiv Detail & Related papers (2020-12-27T17:17:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.