MASK: A flexible framework to facilitate de-identification of clinical
texts
- URL: http://arxiv.org/abs/2005.11687v2
- Date: Fri, 9 Oct 2020 20:09:00 GMT
- Title: MASK: A flexible framework to facilitate de-identification of clinical
texts
- Authors: Nikola Milosevic, Gangamma Kalappa, Hesam Dadafarin, Mahmoud Azimaee,
Goran Nenadic
- Abstract summary: We present MASK, a software package that is designed to perform the de-identification task.
The software is able to perform named entity recognition using some of the state-of-the-art techniques and then mask or redact recognized entities.
- Score: 2.3015324171336378
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Medical health records and clinical summaries contain a vast amount of
important information in textual form that can help advancing research on
treatments, drugs and public health. However, the majority of these information
is not shared because they contain private information about patients, their
families, or medical staff treating them. Regulations such as HIPPA in the US,
PHIPPA in Canada and GDPR regulate the protection, processing and distribution
of this information. In case this information is de-identified and personal
information are replaced or redacted, they could be distributed to the research
community. In this paper, we present MASK, a software package that is designed
to perform the de-identification task. The software is able to perform named
entity recognition using some of the state-of-the-art techniques and then mask
or redact recognized entities. The user is able to select named entity
recognition algorithm (currently implemented are two versions of CRF-based
techniques and BiLSTM-based neural network with pre-trained GLoVe and ELMo
embedding) and masking algorithm (e.g. shift dates, replace names/locations,
totally redact entity).
Related papers
- EMBRE: Entity-aware Masking for Biomedical Relation Extraction [12.821610050561256]
We introduce the Entity-aware Masking for Biomedical Relation Extraction (EMBRE) method for relation extraction.
Specifically, we integrate entity knowledge into a deep neural network by pretraining the backbone model with an entity masking objective.
arXiv Detail & Related papers (2024-01-15T18:12:01Z) - Development and validation of a natural language processing algorithm to
pseudonymize documents in the context of a clinical data warehouse [53.797797404164946]
The study highlights the difficulties faced in sharing tools and resources in this domain.
We annotated a corpus of clinical documents according to 12 types of identifying entities.
We build a hybrid system, merging the results of a deep learning model as well as manual rules.
arXiv Detail & Related papers (2023-03-23T17:17:46Z) - DeID-GPT: Zero-shot Medical Text De-Identification by GPT-4 [80.36535668574804]
We develop a novel GPT4-enabled de-identification framework (DeID-GPT")
Our developed DeID-GPT showed the highest accuracy and remarkable reliability in masking private information from the unstructured medical text.
This study is one of the earliest to utilize ChatGPT and GPT-4 for medical text data processing and de-identification.
arXiv Detail & Related papers (2023-03-20T11:34:37Z) - An Easy-to-use and Robust Approach for the Differentially Private
De-Identification of Clinical Textual Documents [0.0]
This paper shows how an efficient and differentially private de-identification approach can be achieved by strengthening the less robust de-identification.
The result is an approach for de-identifying clinical documents in French language, but also generalizable to other languages.
arXiv Detail & Related papers (2022-11-02T14:25:09Z) - EBOCA: Evidences for BiOmedical Concepts Association Ontology [55.41644538483948]
This paper proposes EBOCA, an ontology that describes (i) biomedical domain concepts and associations between them, and (ii) evidences supporting these associations.
Test data coming from a subset of DISNET and automatic association extractions from texts has been transformed to create a Knowledge Graph that can be used in real scenarios.
arXiv Detail & Related papers (2022-08-01T18:47:03Z) - Cross-modal Clinical Graph Transformer for Ophthalmic Report Generation [116.87918100031153]
We propose a Cross-modal clinical Graph Transformer (CGT) for ophthalmic report generation (ORG)
CGT injects clinical relation triples into the visual features as prior knowledge to drive the decoding procedure.
Experiments on the large-scale FFA-IR benchmark demonstrate that the proposed CGT is able to outperform previous benchmark methods.
arXiv Detail & Related papers (2022-06-04T13:16:30Z) - Towards more patient friendly clinical notes through language models and
ontologies [57.51898902864543]
We present a novel approach to automated medical text based on word simplification and language modelling.
We use a new dataset pairs of publicly available medical sentences and a version of them simplified by clinicians.
Our method based on a language model trained on medical forum data generates simpler sentences while preserving both grammar and the original meaning.
arXiv Detail & Related papers (2021-12-23T16:11:19Z) - User-Centric Health Data Using Self-sovereign Identities [69.50862982117127]
This article presents the potential use of the issuers Self-Sovereign Identities (SSI) and Distributed Ledger Technologies (DLT) to improve the privacy and control of health data.
The paper lists the prominent use cases of decentralized identities in the health area, and discusses an effective blockchain-based architecture.
arXiv Detail & Related papers (2021-07-26T17:09:52Z) - Benchmarking Modern Named Entity Recognition Techniques for Free-text
Health Record De-identification [6.026640792312181]
Federal law restricts the sharing of any EHR data that contains protected health information (PHI)
This project explores several deep learning-based named entity recognition (NER) methods to determine which method(s) perform better on the de-identification task.
We trained and tested our models on the i2b2 training dataset, and qualitatively assessed their performance using EHR data collected from a local hospital.
arXiv Detail & Related papers (2021-03-25T01:26:58Z) - Comparing Rule-based, Feature-based and Deep Neural Methods for
De-identification of Dutch Medical Records [4.339510167603376]
We construct a varied dataset consisting of the medical records of 1260 patients by sampling data from 9 institutes and three domains of Dutch healthcare.
We test the generalizability of three de-identification methods across languages and domains.
arXiv Detail & Related papers (2020-01-16T09:42:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.