Natural Language Processing for Electronic Health Records in Scandinavian Languages: Norwegian, Swedish, and Danish
- URL: http://arxiv.org/abs/2503.18539v1
- Date: Mon, 24 Mar 2025 10:47:32 GMT
- Title: Natural Language Processing for Electronic Health Records in Scandinavian Languages: Norwegian, Swedish, and Danish
- Authors: Ashenafi Zebene Woldaregay, Jørgen Aarmo Lund, Phuong Dinh Ngo, Mariyam Tayefi, Joel Burman, Stine Hansen, Martin Hylleholt Sillesen, Hercules Dalianis, Robert Jenssen, Lindsetmo Rolf Ole, Karl Øyvind Mikalsen,
- Abstract summary: The study aims to perform a systematic review to assess and analyze the state-of-the-art NLP methods for the mainland Scandinavian clinical text.<n>Out of the 113 articles, 18% focus on Norwegian clinical text, 64% (n=72) on Swedish, 10% (n=11) on Danish, and 8% (n=9) focus on more than one language.
- Score: 7.7320970512851614
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Background: Clinical natural language processing (NLP) refers to the use of computational methods for extracting, processing, and analyzing unstructured clinical text data, and holds a huge potential to transform healthcare in various clinical tasks. Objective: The study aims to perform a systematic review to comprehensively assess and analyze the state-of-the-art NLP methods for the mainland Scandinavian clinical text. Method: A literature search was conducted in various online databases including PubMed, ScienceDirect, Google Scholar, ACM digital library, and IEEE Xplore between December 2022 and February 2024. Further, relevant references to the included articles were also used to solidify our search. The final pool includes articles that conducted clinical NLP in the mainland Scandinavian languages and were published in English between 2010 and 2024. Results: Out of the 113 articles, 18% (n=21) focus on Norwegian clinical text, 64% (n=72) on Swedish, 10% (n=11) on Danish, and 8% (n=9) focus on more than one language. Generally, the review identified positive developments across the region despite some observable gaps and disparities between the languages. There are substantial disparities in the level of adoption of transformer-based models. In essential tasks such as de-identification, there is significantly less research activity focusing on Norwegian and Danish compared to Swedish text. Further, the review identified a low level of sharing resources such as data, experimentation code, pre-trained models, and rate of adaptation and transfer learning in the region. Conclusion: The review presented a comprehensive assessment of the state-of-the-art Clinical NLP for electronic health records (EHR) text in mainland Scandinavian languages and, highlighted the potential barriers and challenges that hinder the rapid advancement of the field in the region.
Related papers
- De-identification of clinical free text using natural language
processing: A systematic review of current approaches [48.343430343213896]
Natural language processing has repeatedly demonstrated its feasibility in automating the de-identification process.
Our study aims to provide systematic evidence on how the de-identification of clinical free text has evolved in the last thirteen years.
arXiv Detail & Related papers (2023-11-28T13:20:41Z) - Natural Language Processing in Electronic Health Records in Relation to
Healthcare Decision-making: A Systematic Review [2.555168694997103]
Natural Language Processing is widely used to extract clinical insights from Electronic Health Records.
Lack of annotated data, automated tools, and other challenges hinder the full utilisation of NLP for EHRs.
Various Machine Learning (ML), Deep Learning (DL) and NLP techniques are studied and compared to understand the limitations and opportunities in this space comprehensively.
arXiv Detail & Related papers (2023-06-22T12:10:41Z) - Application of Transformers based methods in Electronic Medical Records:
A Systematic Literature Review [77.34726150561087]
This work presents a systematic literature review of state-of-the-art advances using transformer-based methods on electronic medical records (EMRs) in different NLP tasks.
arXiv Detail & Related papers (2023-04-05T22:19:42Z) - Development and validation of a natural language processing algorithm to
pseudonymize documents in the context of a clinical data warehouse [53.797797404164946]
The study highlights the difficulties faced in sharing tools and resources in this domain.
We annotated a corpus of clinical documents according to 12 types of identifying entities.
We build a hybrid system, merging the results of a deep learning model as well as manual rules.
arXiv Detail & Related papers (2023-03-23T17:17:46Z) - Topic Modelling of Swedish Newspaper Articles about Coronavirus: a Case
Study using Latent Dirichlet Allocation Method [8.405827390095064]
Topic Modelling (TM) is from the research branches of natural language understanding (NLU) and natural language processing (NLP)
In this study, we apply popular Latent Dirichlet Allocation (LDA) methods to model the topic changes in Swedish newspaper articles about Coronavirus.
We describe the corpus we created including 6515 articles, methods applied, and statistics on topic changes over approximately 1 year and two months period of time from 17th January 2020 to 13th March 2021.
arXiv Detail & Related papers (2023-01-08T12:33:58Z) - Few-Shot Cross-lingual Transfer for Coarse-grained De-identification of
Code-Mixed Clinical Texts [56.72488923420374]
Pre-trained language models (LMs) have shown great potential for cross-lingual transfer in low-resource settings.
We show the few-shot cross-lingual transfer property of LMs for named recognition (NER) and apply it to solve a low-resource and real-world challenge of code-mixed (Spanish-Catalan) clinical notes de-identification in the stroke.
arXiv Detail & Related papers (2022-04-10T21:46:52Z) - A Scoping Review of Publicly Available Language Tasks in Clinical
Natural Language Processing [7.966218734325912]
We searched six databases, including biomedical research and computer science literature database.
A total of 35 papers with 47 clinical NLP tasks met inclusion criteria between 2007 and 2021.
arXiv Detail & Related papers (2021-12-07T22:49:58Z) - CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark [51.38557174322772]
We present the first Chinese Biomedical Language Understanding Evaluation benchmark.
It is a collection of natural language understanding tasks including named entity recognition, information extraction, clinical diagnosis normalization, single-sentence/sentence-pair classification.
We report empirical results with the current 11 pre-trained Chinese models, and experimental results show that state-of-the-art neural models perform by far worse than the human ceiling.
arXiv Detail & Related papers (2021-06-15T12:25:30Z) - A Systematic Review of Natural Language Processing Applied to Radiology
Reports [3.600747505433814]
This study systematically assesses recent literature in NLP applied to radiology reports.
Our analysis is based on 21 variables including radiology characteristics, NLP methodology, performance, study, and clinical application characteristics.
arXiv Detail & Related papers (2021-02-18T18:54:41Z) - Benchmarking Automated Clinical Language Simplification: Dataset,
Algorithm, and Evaluation [48.87254340298189]
We construct a new dataset named MedLane to support the development and evaluation of automated clinical language simplification approaches.
We propose a new model called DECLARE that follows the human annotation procedure and achieves state-of-the-art performance.
arXiv Detail & Related papers (2020-12-04T06:09:02Z) - SemClinBr -- a multi institutional and multi specialty semantically
annotated corpus for Portuguese clinical NLP tasks [0.7311642662742726]
SemClinBr is a corpus that has 1,000 clinical notes, labeled with 65,117 entities and 11,263 relations.
This work is the SemClinBr, a corpus that has 1,000 clinical notes, labeled with 65,117 entities and 11,263 relations.
arXiv Detail & Related papers (2020-01-27T20:39:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.