Performance of Automatic De-identification Across Different Note Types
- URL: http://arxiv.org/abs/2102.11032v1
- Date: Wed, 17 Feb 2021 00:55:40 GMT
- Title: Performance of Automatic De-identification Across Different Note Types
- Authors: Nicholas Dobbins, David Wayne, Kahyun Lee, \"Ozlem Uzuner, Meliha
Yetisgen
- Abstract summary: Concerns about patient privacy and confidentiality limit the use of clinical notes for research.
We present the performance of a state-of-the art de-id system called NeuroNER1 on a diverse set of notes from University of Washington.
- Score: 0.8399688944263842
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Free-text clinical notes detail all aspects of patient care and have great
potential to facilitate quality improvement and assurance initiatives as well
as advance clinical research. However, concerns about patient privacy and
confidentiality limit the use of clinical notes for research. As a result, the
information documented in these notes remains unavailable for most researchers.
De-identification (de-id), i.e., locating and removing personally identifying
protected health information (PHI), is one way of improving access to clinical
narratives. However, there are limited off-the-shelf de-identification systems
able to consistently detect PHI across different data sources and medical
specialties. In this abstract, we present the performance of a state-of-the art
de-id system called NeuroNER1 on a diverse set of notes from University of
Washington (UW) when the models are trained on data from an external
institution (Partners Healthcare) vs. from the same institution (UW). We
present results at the level of PHI and note types.
Related papers
- DIRI: Adversarial Patient Reidentification with Large Language Models for Evaluating Clinical Text Anonymization [13.038800602897354]
We develop an adversarial approach using a large language model to re-identify the patient corresponding to a redacted clinical note.
Our method uses a large language model to reidentify the patient corresponding to a redacted clinical note.
Although ClinicalBERT was the most effective, masking all identified PII, our tool still reidentified 9% of clinical notes.
arXiv Detail & Related papers (2024-10-22T14:06:31Z) - Improving Clinical Note Generation from Complex Doctor-Patient Conversation [20.2157016701399]
We present three key contributions to the field of clinical note generation using large language models (LLMs)
First, we introduce CliniKnote, a dataset consisting of 1,200 complex doctor-patient conversations paired with their full clinical notes.
Second, we propose K-SOAP, which enhances traditional SOAPcitepodder20soap (Subjective, Objective, Assessment, and Plan) notes by adding a keyword section at the top, allowing for quick identification of essential information.
Third, we develop an automatic pipeline to generate K-SOAP notes from doctor-patient conversations and benchmark various modern LLMs using various
arXiv Detail & Related papers (2024-08-26T18:39:31Z) - Conceptualizing Machine Learning for Dynamic Information Retrieval of
Electronic Health Record Notes [6.1656026560972]
This work conceptualizes the use of EHR audit logs for machine learning as a source of supervision of note relevance in a specific clinical context.
We show that our methods can achieve an AUC of 0.963 for predicting which notes will be read in an individual note writing session.
arXiv Detail & Related papers (2023-08-09T21:04:19Z) - Making the Most Out of the Limited Context Length: Predictive Power
Varies with Clinical Note Type and Note Section [70.37720062263176]
We propose a framework to analyze the sections with high predictive power.
Using MIMIC-III, we show that: 1) predictive power distribution is different between nursing notes and discharge notes and 2) combining different types of notes could improve performance when the context length is large.
arXiv Detail & Related papers (2023-07-13T20:04:05Z) - Development and validation of a natural language processing algorithm to
pseudonymize documents in the context of a clinical data warehouse [53.797797404164946]
The study highlights the difficulties faced in sharing tools and resources in this domain.
We annotated a corpus of clinical documents according to 12 types of identifying entities.
We build a hybrid system, merging the results of a deep learning model as well as manual rules.
arXiv Detail & Related papers (2023-03-23T17:17:46Z) - DeID-GPT: Zero-shot Medical Text De-Identification by GPT-4 [80.36535668574804]
We develop a novel GPT4-enabled de-identification framework (DeID-GPT")
Our developed DeID-GPT showed the highest accuracy and remarkable reliability in masking private information from the unstructured medical text.
This study is one of the earliest to utilize ChatGPT and GPT-4 for medical text data processing and de-identification.
arXiv Detail & Related papers (2023-03-20T11:34:37Z) - An Easy-to-use and Robust Approach for the Differentially Private
De-Identification of Clinical Textual Documents [0.0]
This paper shows how an efficient and differentially private de-identification approach can be achieved by strengthening the less robust de-identification.
The result is an approach for de-identifying clinical documents in French language, but also generalizable to other languages.
arXiv Detail & Related papers (2022-11-02T14:25:09Z) - User-Driven Research of Medical Note Generation Software [49.85146209418244]
We present three rounds of user studies carried out in the context of developing a medical note generation system.
We discuss the participating clinicians' impressions and views of how the system ought to be adapted to be of value to them.
We describe a three-week test run of the system in a live telehealth clinical practice.
arXiv Detail & Related papers (2022-05-05T10:18:06Z) - Human Evaluation and Correlation with Automatic Metrics in Consultation
Note Generation [56.25869366777579]
In recent years, machine learning models have rapidly become better at generating clinical consultation notes.
We present an extensive human evaluation study where 5 clinicians listen to 57 mock consultations, write their own notes, post-edit a number of automatically generated notes, and extract all the errors.
We find that a simple, character-based Levenshtein distance metric performs on par if not better than common model-based metrics like BertScore.
arXiv Detail & Related papers (2022-04-01T14:04:16Z) - Classifying Cyber-Risky Clinical Notes by Employing Natural Language
Processing [9.77063694539068]
Recently, some states within the United States of America require patients to have open access to their clinical notes.
This research investigates methods for identifying security/privacy risks within clinical notes.
arXiv Detail & Related papers (2022-03-24T00:36:59Z) - Enriching Unsupervised User Embedding via Medical Concepts [51.17532619610099]
Unsupervised user embedding aims to encode patients into fixed-length vectors without human supervisions.
Medical concepts extracted from the clinical notes contain rich connections between patients and their clinical categories.
We propose a concept-aware unsupervised user embedding that jointly leverages text documents and medical concepts from two clinical corpora.
arXiv Detail & Related papers (2022-03-20T18:54:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.