DICOM De-Identification via Hybrid AI and Rule-Based Framework for Scalable, Uncertainty-Aware Redaction
- URL: http://arxiv.org/abs/2507.23736v1
- Date: Thu, 31 Jul 2025 17:19:38 GMT
- Title: DICOM De-Identification via Hybrid AI and Rule-Based Framework for Scalable, Uncertainty-Aware Redaction
- Authors: Kyle Naddeo, Nikolas Koutsoubis, Rahul Krish, Ghulam Rasool, Nidhal Bouaynaya, Tony OSullivan, Raj Krish,
- Abstract summary: This paper presents a hybrid de-identification framework that combines rule-based and AI-driven techniques.<n>Our solution addresses critical challenges in medical data de-identification and supports the secure, ethical, and trustworthy release of imaging data for research.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Access to medical imaging and associated text data has the potential to drive major advances in healthcare research and patient outcomes. However, the presence of Protected Health Information (PHI) and Personally Identifiable Information (PII) in Digital Imaging and Communications in Medicine (DICOM) files presents a significant barrier to the ethical and secure sharing of imaging datasets. This paper presents a hybrid de-identification framework developed by Impact Business Information Solutions (IBIS) that combines rule-based and AI-driven techniques, and rigorous uncertainty quantification for comprehensive PHI/PII removal from both metadata and pixel data. Our approach begins with a two-tiered rule-based system targeting explicit and inferred metadata elements, further augmented by a large language model (LLM) fine-tuned for Named Entity Recognition (NER), and trained on a suite of synthetic datasets simulating realistic clinical PHI/PII. For pixel data, we employ an uncertainty-aware Faster R-CNN model to localize embedded text, extract candidate PHI via Optical Character Recognition (OCR), and apply the NER pipeline for final redaction. Crucially, uncertainty quantification provides confidence measures for AI-based detections to enhance automation reliability and enable informed human-in-the-loop verification to manage residual risks. This uncertainty-aware deidentification framework achieves robust performance across benchmark datasets and regulatory standards, including DICOM, HIPAA, and TCIA compliance metrics. By combining scalable automation, uncertainty quantification, and rigorous quality assurance, our solution addresses critical challenges in medical data de-identification and supports the secure, ethical, and trustworthy release of imaging data for research.
Related papers
- Medical Image De-Identification Resources: Synthetic DICOM Data and Tools for Validation [0.10617782943195009]
Ensuring patient privacy remains a significant challenge for open-access data sharing.<n>Digital Imaging and Communications in Medicine (DICOM) encodes both essential clinical metadata and extensive protected health information (PHI) and personally identifiable information (PII)<n>To address this gap, we developed an openly accessible DICOM dataset infused with synthetic PHI/PII and an evaluation framework for benchmarking image de-identification.
arXiv Detail & Related papers (2025-08-03T18:48:28Z) - Towards Trustworthy AI: Secure Deepfake Detection using CNNs and Zero-Knowledge Proofs [0.0]
TrustDefender is a framework that detects deepfake imagery in real-time extended reality (XR) streams.<n>Our work establishes a foundation for reliable AI in immersive and privacy-sensitive applications.
arXiv Detail & Related papers (2025-07-22T20:47:46Z) - Decentralized LoRA Augmented Transformer with Context-aware Multi-scale Feature Learning for Secured Eye Diagnosis [2.1358421658740214]
This paper proposes a novel Data efficient Image Transformer (DeiT) based framework that integrates context aware multiscale patch embedding, Low-Rank Adaptation (LoRA), knowledge distillation, and federated learning to address these challenges in a unified manner.<n>The proposed model effectively captures both local and global retinal features by leveraging multi scale patch representations with local and global attention mechanisms.
arXiv Detail & Related papers (2025-05-11T13:51:56Z) - Revisiting Medical Image Retrieval via Knowledge Consolidation [46.6989555659494]
We propose a novel method to consolidate knowledge of hierarchical features and functions.<n>We introduce Depth-aware Representation Fusion (DaRF) and Structure-aware Contrastive Hashing (SCH)<n>Our method achieves a 5.6-38.9% improvement in mean Average Precision on the anatomical radiology dataset.
arXiv Detail & Related papers (2025-03-12T13:16:42Z) - Clinical Evaluation of Medical Image Synthesis: A Case Study in Wireless Capsule Endoscopy [63.39037092484374]
Synthetic Data Generation based on Artificial Intelligence (AI) can transform the way clinical medicine is delivered.<n>This study focuses on the clinical evaluation of medical SDG, with a proof-of-concept investigation on diagnosing Inflammatory Bowel Disease (IBD) using Wireless Capsule Endoscopy (WCE) images.<n>The results show that TIDE-II generates clinically plausible, very realistic WCE images, of improved quality compared to relevant state-of-the-art generative models.
arXiv Detail & Related papers (2024-10-31T19:48:50Z) - Which Client is Reliable?: A Reliable and Personalized Prompt-based Federated Learning for Medical Image Question Answering [51.26412822853409]
We present a novel personalized federated learning (pFL) method for medical visual question answering (VQA) models.
Our method introduces learnable prompts into a Transformer architecture to efficiently train it on diverse medical datasets without massive computational costs.
arXiv Detail & Related papers (2024-10-23T00:31:17Z) - MedMNIST-C: Comprehensive benchmark and improved classifier robustness by simulating realistic image corruptions [0.13108652488669734]
integration of neural-network-based systems into clinical practice is limited by challenges related to domain generalization and robustness.
We create and open-source MedMNIST-C, a benchmark dataset based on the MedMNIST+ collection covering 12 datasets and 9 imaging modalities.
arXiv Detail & Related papers (2024-06-25T13:20:39Z) - Privacy-Preserving Medical Image Classification through Deep Learning
and Matrix Decomposition [0.0]
Deep learning (DL) solutions have been extensively researched in the medical domain in recent years.
The usage of health-related data is strictly regulated, processing medical records outside the hospital environment demands robust data protection measures.
In this paper, we use singular value decomposition (SVD) and principal component analysis (PCA) to obfuscate the medical images before employing them in the DL analysis.
The capability of DL algorithms to extract relevant information from secured data is assessed on a task of angiographic view classification based on obfuscated frames.
arXiv Detail & Related papers (2023-08-31T08:21:09Z) - Towards Reliable Medical Image Segmentation by utilizing Evidential Calibrated Uncertainty [52.03490691733464]
We introduce DEviS, an easily implementable foundational model that seamlessly integrates into various medical image segmentation networks.
By leveraging subjective logic theory, we explicitly model probability and uncertainty for the problem of medical image segmentation.
DeviS incorporates an uncertainty-aware filtering module, which utilizes the metric of uncertainty-calibrated error to filter reliable data.
arXiv Detail & Related papers (2023-01-01T05:02:46Z) - Privacy-preserving medical image analysis [53.4844489668116]
We present PriMIA, a software framework designed for privacy-preserving machine learning (PPML) in medical imaging.
We show significantly better classification performance of a securely aggregated federated learning model compared to human experts on unseen datasets.
We empirically evaluate the framework's security against a gradient-based model inversion attack.
arXiv Detail & Related papers (2020-12-10T13:56:00Z) - Semi-supervised Medical Image Classification with Relation-driven
Self-ensembling Model [71.80319052891817]
We present a relation-driven semi-supervised framework for medical image classification.
It exploits the unlabeled data by encouraging the prediction consistency of given input under perturbations.
Our method outperforms many state-of-the-art semi-supervised learning methods on both single-label and multi-label image classification scenarios.
arXiv Detail & Related papers (2020-05-15T06:57:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.