Related papers: MASK: A flexible framework to facilitate de-identification of clinical texts

MASK: A flexible framework to facilitate de-identification of clinical texts

URL: http://arxiv.org/abs/2005.11687v2
Date: Fri, 9 Oct 2020 20:09:00 GMT
Title: MASK: A flexible framework to facilitate de-identification of clinical texts
Authors: Nikola Milosevic, Gangamma Kalappa, Hesam Dadafarin, Mahmoud Azimaee, Goran Nenadic
Abstract summary: We present MASK, a software package that is designed to perform the de-identification task. The software is able to perform named entity recognition using some of the state-of-the-art techniques and then mask or redact recognized entities.
Score: 2.3015324171336378
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Medical health records and clinical summaries contain a vast amount of important information in textual form that can help advancing research on treatments, drugs and public health. However, the majority of these information is not shared because they contain private information about patients, their families, or medical staff treating them. Regulations such as HIPPA in the US, PHIPPA in Canada and GDPR regulate the protection, processing and distribution of this information. In case this information is de-identified and personal information are replaced or redacted, they could be distributed to the research community. In this paper, we present MASK, a software package that is designed to perform the de-identification task. The software is able to perform named entity recognition using some of the state-of-the-art techniques and then mask or redact recognized entities. The user is able to select named entity recognition algorithm (currently implemented are two versions of CRF-based techniques and BiLSTM-based neural network with pre-trained GLoVe and ELMo embedding) and masking algorithm (e.g. shift dates, replace names/locations, totally redact entity).

Related papers

Medical Hallucinations in Foundation Models and Their Impact on Healthcare [53.97060824532454]
Foundation Models that are capable of processing and generating multi-modal data have transformed AI's role in medicine. We define medical hallucination as any instance in which a model generates misleading medical content. Our results reveal that inference techniques such as Chain-of-Thought (CoT) and Search Augmented Generation can effectively reduce hallucination rates. These findings underscore the ethical and practical imperative for robust detection and mitigation strategies.
arXiv Detail & Related papers (2025-02-26T02:30:44Z)
LIMIS: Towards Language-based Interactive Medical Image Segmentation [58.553786162527686]
LIMIS is the first purely language-based interactive medical image segmentation model. We adapt Grounded SAM to the medical domain and design a language-based model interaction strategy. We evaluate LIMIS on three publicly available medical datasets in terms of performance and usability.
arXiv Detail & Related papers (2024-10-22T12:13:47Z)
De-Identification of Medical Imaging Data: A Comprehensive Tool for Ensuring Patient Privacy [4.376648893167674]
Open-source tool can be used to de-identify DICOM magnetic resonance images, computer images, whole slide images and magnetic resonance twix raw data. Proposal comprises an elaborate anonymization pipeline for multiple types of inputs, reducing the need for additional tools used for de-identification of imaging data.
arXiv Detail & Related papers (2024-10-16T09:31:24Z)
FEDMEKI: A Benchmark for Scaling Medical Foundation Models via Federated Knowledge Injection [83.54960238236548]
FEDMEKI not only preserves data privacy but also enhances the capability of medical foundation models. FEDMEKI allows medical foundation models to learn from a broader spectrum of medical knowledge without direct data exposure.
arXiv Detail & Related papers (2024-08-17T15:18:56Z)
Development and validation of a natural language processing algorithm to pseudonymize documents in the context of a clinical data warehouse [53.797797404164946]
The study highlights the difficulties faced in sharing tools and resources in this domain. We annotated a corpus of clinical documents according to 12 types of identifying entities. We build a hybrid system, merging the results of a deep learning model as well as manual rules.
arXiv Detail & Related papers (2023-03-23T17:17:46Z)
DeID-GPT: Zero-shot Medical Text De-Identification by GPT-4 [80.36535668574804]
We develop a novel GPT4-enabled de-identification framework (DeID-GPT") Our developed DeID-GPT showed the highest accuracy and remarkable reliability in masking private information from the unstructured medical text. This study is one of the earliest to utilize ChatGPT and GPT-4 for medical text data processing and de-identification.
arXiv Detail & Related papers (2023-03-20T11:34:37Z)
An Easy-to-use and Robust Approach for the Differentially Private De-Identification of Clinical Textual Documents [0.0]
This paper shows how an efficient and differentially private de-identification approach can be achieved by strengthening the less robust de-identification. The result is an approach for de-identifying clinical documents in French language, but also generalizable to other languages.
arXiv Detail & Related papers (2022-11-02T14:25:09Z)
EBOCA: Evidences for BiOmedical Concepts Association Ontology [55.41644538483948]
This paper proposes EBOCA, an ontology that describes (i) biomedical domain concepts and associations between them, and (ii) evidences supporting these associations. Test data coming from a subset of DISNET and automatic association extractions from texts has been transformed to create a Knowledge Graph that can be used in real scenarios.
arXiv Detail & Related papers (2022-08-01T18:47:03Z)
Towards more patient friendly clinical notes through language models and ontologies [57.51898902864543]
We present a novel approach to automated medical text based on word simplification and language modelling. We use a new dataset pairs of publicly available medical sentences and a version of them simplified by clinicians. Our method based on a language model trained on medical forum data generates simpler sentences while preserving both grammar and the original meaning.
arXiv Detail & Related papers (2021-12-23T16:11:19Z)
Benchmarking Modern Named Entity Recognition Techniques for Free-text Health Record De-identification [6.026640792312181]
Federal law restricts the sharing of any EHR data that contains protected health information (PHI) This project explores several deep learning-based named entity recognition (NER) methods to determine which method(s) perform better on the de-identification task. We trained and tested our models on the i2b2 training dataset, and qualitatively assessed their performance using EHR data collected from a local hospital.
arXiv Detail & Related papers (2021-03-25T01:26:58Z)
Comparing Rule-based, Feature-based and Deep Neural Methods for De-identification of Dutch Medical Records [4.339510167603376]
We construct a varied dataset consisting of the medical records of 1260 patients by sampling data from 9 institutes and three domains of Dutch healthcare. We test the generalizability of three de-identification methods across languages and domains.
arXiv Detail & Related papers (2020-01-16T09:42:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.