De-identification of clinical free text using natural language
processing: A systematic review of current approaches
- URL: http://arxiv.org/abs/2312.03736v1
- Date: Tue, 28 Nov 2023 13:20:41 GMT
- Title: De-identification of clinical free text using natural language
processing: A systematic review of current approaches
- Authors: Aleksandar Kova\v{c}evi\'c, Bojana Ba\v{s}aragin, Nikola
Milo\v{s}evi\'c, Goran Nenadi\'c
- Abstract summary: Natural language processing has repeatedly demonstrated its feasibility in automating the de-identification process.
Our study aims to provide systematic evidence on how the de-identification of clinical free text has evolved in the last thirteen years.
- Score: 48.343430343213896
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Background: Electronic health records (EHRs) are a valuable resource for
data-driven medical research. However, the presence of protected health
information (PHI) makes EHRs unsuitable to be shared for research purposes.
De-identification, i.e. the process of removing PHI is a critical step in
making EHR data accessible. Natural language processing has repeatedly
demonstrated its feasibility in automating the de-identification process.
Objectives: Our study aims to provide systematic evidence on how the
de-identification of clinical free text has evolved in the last thirteen years,
and to report on the performances and limitations of the current
state-of-the-art systems. In addition, we aim to identify challenges and
potential research opportunities in this field. Methods: A systematic search in
PubMed, Web of Science and the DBLP was conducted for studies published between
January 2010 and February 2023. Titles and abstracts were examined to identify
the relevant studies. Selected studies were then analysed in-depth, and
information was collected on de-identification methodologies, data sources, and
measured performance. Results: A total of 2125 publications were identified for
the title and abstract screening. 69 studies were found to be relevant. Machine
learning (37 studies) and hybrid (26 studies) approaches are predominant, while
six studies relied only on rules. Majority of the approaches were trained and
evaluated on public corpora. The 2014 i2b2/UTHealth corpus is the most
frequently used (36 studies), followed by the 2006 i2b2 (18 studies) and 2016
CEGS N-GRID (10 studies) corpora.
Related papers
- A Survey on Data Selection for Language Models [148.300726396877]
Data selection methods aim to determine which data points to include in a training dataset.
Deep learning is mostly driven by empirical evidence and experimentation on large-scale data is expensive.
Few organizations have the resources for extensive data selection research.
arXiv Detail & Related papers (2024-02-26T18:54:35Z) - A survey of recent methods for addressing AI fairness and bias in
biomedicine [48.46929081146017]
Artificial intelligence systems may perpetuate social inequities or demonstrate biases, such as those based on race or gender.
We surveyed recent publications on different debiasing methods in the fields of biomedical natural language processing (NLP) or computer vision (CV)
We performed a literature search on PubMed, ACM digital library, and IEEE Xplore of relevant articles published between January 2018 and December 2023 using multiple combinations of keywords.
We reviewed other potential methods from the general domain that could be applied to biomedicine to address bias and improve fairness.
arXiv Detail & Related papers (2024-02-13T06:38:46Z) - A Review of Deep Learning Methods for Photoplethysmography Data [10.27280499967643]
Photoplethysmography is a promising device due to its advantages in portability, user-friendly operation, and non-invasive capabilities.
Recent advancements in deep learning have demonstrated remarkable outcomes by leveraging PPG signals for tasks related to personal health management.
arXiv Detail & Related papers (2024-01-23T14:11:29Z) - Natural Language Processing in Electronic Health Records in Relation to
Healthcare Decision-making: A Systematic Review [2.555168694997103]
Natural Language Processing is widely used to extract clinical insights from Electronic Health Records.
Lack of annotated data, automated tools, and other challenges hinder the full utilisation of NLP for EHRs.
Various Machine Learning (ML), Deep Learning (DL) and NLP techniques are studied and compared to understand the limitations and opportunities in this space comprehensively.
arXiv Detail & Related papers (2023-06-22T12:10:41Z) - Development and validation of a natural language processing algorithm to
pseudonymize documents in the context of a clinical data warehouse [53.797797404164946]
The study highlights the difficulties faced in sharing tools and resources in this domain.
We annotated a corpus of clinical documents according to 12 types of identifying entities.
We build a hybrid system, merging the results of a deep learning model as well as manual rules.
arXiv Detail & Related papers (2023-03-23T17:17:46Z) - What is the State of the Art of Computer Vision-Assisted Cytology? A
Systematic Literature Review [47.42354724922676]
We conducted a Systematic Literature Review to identify the state-of-art of computer vision techniques currently applied to cytology.
The most used methods in the analyzed works are deep learning-based (70 papers), while fewer works employ classic computer vision only (101 papers)
We conclude that there still is a lack of high-quality datasets for many types of stains and most of the works are not mature enough to be applied in a daily clinical diagnostic routine.
arXiv Detail & Related papers (2021-05-24T13:50:45Z) - A Systematic Review of Natural Language Processing Applied to Radiology
Reports [3.600747505433814]
This study systematically assesses recent literature in NLP applied to radiology reports.
Our analysis is based on 21 variables including radiology characteristics, NLP methodology, performance, study, and clinical application characteristics.
arXiv Detail & Related papers (2021-02-18T18:54:41Z) - Artificial Intelligence, speech and language processing approaches to
monitoring Alzheimer's Disease: a systematic review [5.635607414700482]
This paper summarises current findings on the use of artificial intelligence, speech and language processing to predict cognitive decline in Alzheimer's Disease.
We conducted a systematic review of original research between 2000 and 2019 registered in PROSPERO.
arXiv Detail & Related papers (2020-10-12T21:43:04Z) - Digital personal health libraries: a systematic literature review [15.392783869176778]
This paper gives context on recent literature regarding the development of digital personal health libraries (PHL)
It provides insights into the potential application of consumer health informatics in diverse clinical specialties.
arXiv Detail & Related papers (2020-06-21T01:11:38Z) - Opportunities and Challenges of Deep Learning Methods for
Electrocardiogram Data: A Systematic Review [62.490310870300746]
The electrocardiogram (ECG) is one of the most commonly used diagnostic tools in medicine and healthcare.
Deep learning methods have achieved promising results on predictive healthcare tasks using ECG signals.
This paper presents a systematic review of deep learning methods for ECG data from both modeling and application perspectives.
arXiv Detail & Related papers (2019-12-28T02:44:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.