Classifying Cyber-Risky Clinical Notes by Employing Natural Language
Processing
- URL: http://arxiv.org/abs/2203.12781v1
- Date: Thu, 24 Mar 2022 00:36:59 GMT
- Title: Classifying Cyber-Risky Clinical Notes by Employing Natural Language
Processing
- Authors: Suzanna Schmeelk, Martins Samuel Dogo, Yifan Peng, Braja Gopal Patra
- Abstract summary: Recently, some states within the United States of America require patients to have open access to their clinical notes.
This research investigates methods for identifying security/privacy risks within clinical notes.
- Score: 9.77063694539068
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Clinical notes, which can be embedded into electronic medical records,
document patient care delivery and summarize interactions between healthcare
providers and patients. These clinical notes directly inform patient care and
can also indirectly inform research and quality/safety metrics, among other
indirect metrics. Recently, some states within the United States of America
require patients to have open access to their clinical notes to improve the
exchange of patient information for patient care. Thus, developing methods to
assess the cyber risks of clinical notes before sharing and exchanging data is
critical. While existing natural language processing techniques are geared to
de-identify clinical notes, to the best of our knowledge, few have focused on
classifying sensitive-information risk, which is a fundamental step toward
developing effective, widespread protection of patient health information. To
bridge this gap, this research investigates methods for identifying
security/privacy risks within clinical notes. The classification either can be
used upstream to identify areas within notes that likely contain sensitive
information or downstream to improve the identification of clinical notes that
have not been entirely de-identified. We develop several models using unigram
and word2vec features with different classifiers to categorize sentence risk.
Experiments on i2b2 de-identification dataset show that the SVM classifier
using word2vec features obtained a maximum F1-score of 0.792. Future research
involves articulation and differentiation of risk in terms of different global
regulatory requirements.
Related papers
- DIRI: Adversarial Patient Reidentification with Large Language Models for Evaluating Clinical Text Anonymization [13.038800602897354]
We develop an adversarial approach using a large language model to re-identify the patient corresponding to a redacted clinical note.
Our method uses a large language model to reidentify the patient corresponding to a redacted clinical note.
Although ClinicalBERT was the most effective, masking all identified PII, our tool still reidentified 9% of clinical notes.
arXiv Detail & Related papers (2024-10-22T14:06:31Z) - DeIDClinic: A Multi-Layered Framework for De-identification of Clinical Free-text Data [6.473402241020136]
This work enhances the MASK framework by integrating ClinicalBERT, a deep learning model specifically fine-tuned on clinical texts.
The system effectively identifies and either redacts or replaces sensitive identifiable entities within clinical documents.
A risk assessment feature has also been developed, which analyses the uniqueness of context within documents to classify them into risk levels.
arXiv Detail & Related papers (2024-10-02T15:16:02Z) - Improving Multiple Sclerosis Lesion Segmentation Across Clinical Sites:
A Federated Learning Approach with Noise-Resilient Training [75.40980802817349]
Deep learning models have shown promise for automatically segmenting MS lesions, but the scarcity of accurately annotated data hinders progress in this area.
We introduce a Decoupled Hard Label Correction (DHLC) strategy that considers the imbalanced distribution and fuzzy boundaries of MS lesions.
We also introduce a Centrally Enhanced Label Correction (CELC) strategy, which leverages the aggregated central model as a correction teacher for all sites.
arXiv Detail & Related papers (2023-08-31T00:36:10Z) - Informing clinical assessment by contextualizing post-hoc explanations
of risk prediction models in type-2 diabetes [50.8044927215346]
We consider a comorbidity risk prediction scenario and focus on contexts regarding the patients clinical state.
We employ several state-of-the-art LLMs to present contexts around risk prediction model inferences and evaluate their acceptability.
Our paper is one of the first end-to-end analyses identifying the feasibility and benefits of contextual explanations in a real-world clinical use case.
arXiv Detail & Related papers (2023-02-11T18:07:11Z) - A Multimodal Transformer: Fusing Clinical Notes with Structured EHR Data
for Interpretable In-Hospital Mortality Prediction [8.625186194860696]
We provide a novel multimodal transformer to fuse clinical notes and structured EHR data for better prediction of in-hospital mortality.
To improve interpretability, we propose an integrated gradients (IG) method to select important words in clinical notes.
We also investigate the significance of domain adaptive pretraining and task adaptive fine-tuning on the Clinical BERT.
arXiv Detail & Related papers (2022-08-09T03:49:52Z) - Cross-Lingual Knowledge Transfer for Clinical Phenotyping [55.92262310716537]
We investigate cross-lingual knowledge transfer strategies to execute this task for clinics that do not use the English language.
We evaluate these strategies for a Greek and a Spanish clinic leveraging clinical notes from different clinical domains.
Our results show that using multilingual data overall improves clinical phenotyping models and can compensate for data sparseness.
arXiv Detail & Related papers (2022-08-03T08:33:21Z) - Cross-modal Clinical Graph Transformer for Ophthalmic Report Generation [116.87918100031153]
We propose a Cross-modal clinical Graph Transformer (CGT) for ophthalmic report generation (ORG)
CGT injects clinical relation triples into the visual features as prior knowledge to drive the decoding procedure.
Experiments on the large-scale FFA-IR benchmark demonstrate that the proposed CGT is able to outperform previous benchmark methods.
arXiv Detail & Related papers (2022-06-04T13:16:30Z) - Making sense of violence risk predictions using clinical notes [0.988455728566886]
Violence risk assessment in psychiatric institutions enables interventions to avoid violence incidents.
Previous studies have attempted to assess violence risk in psychiatric patients using such notes, with acceptable performance.
arXiv Detail & Related papers (2022-04-29T10:00:07Z) - Enriching Unsupervised User Embedding via Medical Concepts [51.17532619610099]
Unsupervised user embedding aims to encode patients into fixed-length vectors without human supervisions.
Medical concepts extracted from the clinical notes contain rich connections between patients and their clinical categories.
We propose a concept-aware unsupervised user embedding that jointly leverages text documents and medical concepts from two clinical corpora.
arXiv Detail & Related papers (2022-03-20T18:54:05Z) - Towards more patient friendly clinical notes through language models and
ontologies [57.51898902864543]
We present a novel approach to automated medical text based on word simplification and language modelling.
We use a new dataset pairs of publicly available medical sentences and a version of them simplified by clinicians.
Our method based on a language model trained on medical forum data generates simpler sentences while preserving both grammar and the original meaning.
arXiv Detail & Related papers (2021-12-23T16:11:19Z) - Performance of Automatic De-identification Across Different Note Types [0.8399688944263842]
Concerns about patient privacy and confidentiality limit the use of clinical notes for research.
We present the performance of a state-of-the art de-id system called NeuroNER1 on a diverse set of notes from University of Washington.
arXiv Detail & Related papers (2021-02-17T00:55:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.