Medical Image De-Identification Benchmark Challenge
- URL: http://arxiv.org/abs/2507.23608v1
- Date: Thu, 31 Jul 2025 14:47:20 GMT
- Title: Medical Image De-Identification Benchmark Challenge
- Authors: Linmin Pei, Granger Sutton, Michael Rutherford, Ulrike Wagner, Tracy Nolan, Kirk Smith, Phillip Farmer, Peter Gu, Ambar Rana, Kailing Chen, Thomas Ferleman, Brian Park, Ye Wu, Jordan Kojouharov, Gargi Singh, Jon Lemon, Tyler Willis, Milos Vukadinovic, Grant Duffy, Bryan He, David Ouyang, Marco Pereanez, Daniel Samber, Derek A. Smith, Christopher Cannistraci, Zahi Fayad, David S. Mendelson, Michele Bufano, Elmar Kotter, Hamideh Haghiri, Rajesh Baidya, Stefan Dvoretskii, Klaus H. Maier-Hein, Marco Nolden, Christopher Ablett, Silvia Siggillino, Sandeep Kaushik, Hongzhu Jiang, Sihan Xie, Zhiyu Wan, Alex Michie, Simon J Doran, Angeline Aurelia Waly, Felix A. Nathaniel Liang, Humam Arshad Mustagfirin, Michelle Grace Felicia, Kuo Po Chih, Rahul Krish, Ghulam Rasool, Nidhal Bouaynaya, Nikolas Koutsoubis, Kyle Naddeo, Kartik Pandit, Tony O'Sullivan, Raj Krish, Qinyan Pan, Scott Gustafson, Benjamin Kopchick, Laura Opsahl-Ong, Andrea Olvera-Morales, Jonathan Pinney, Kathryn Johnson, Theresa Do, Juergen Klenk, Maria Diaz, Arti Singh, Rong Chai, David A. Clunie, Fred Prior, Keyvan Farahani,
- Abstract summary: The aim of the MIDI-B Challenge was to provide a standardized platform for benchmarking of DICOM image deID tools.<n>The challenge employed a large, diverse, multi-center, and multi-modality set of real de-identified radiology images with synthetic PHI/PII inserted.<n>Ten teams successfully completed the test phase of the challenge.
- Score: 1.491270549044044
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: The de-identification (deID) of protected health information (PHI) and personally identifiable information (PII) is a fundamental requirement for sharing medical images, particularly through public repositories, to ensure compliance with patient privacy laws. In addition, preservation of non-PHI metadata to inform and enable downstream development of imaging artificial intelligence (AI) is an important consideration in biomedical research. The goal of MIDI-B was to provide a standardized platform for benchmarking of DICOM image deID tools based on a set of rules conformant to the HIPAA Safe Harbor regulation, the DICOM Attribute Confidentiality Profiles, and best practices in preservation of research-critical metadata, as defined by The Cancer Imaging Archive (TCIA). The challenge employed a large, diverse, multi-center, and multi-modality set of real de-identified radiology images with synthetic PHI/PII inserted. The MIDI-B Challenge consisted of three phases: training, validation, and test. Eighty individuals registered for the challenge. In the training phase, we encouraged participants to tune their algorithms using their in-house or public data. The validation and test phases utilized the DICOM images containing synthetic identifiers (of 216 and 322 subjects, respectively). Ten teams successfully completed the test phase of the challenge. To measure success of a rule-based approach to image deID, scores were computed as the percentage of correct actions from the total number of required actions. The scores ranged from 97.91% to 99.93%. Participants employed a variety of open-source and proprietary tools with customized configurations, large language models, and optical character recognition (OCR). In this paper we provide a comprehensive report on the MIDI-B Challenge's design, implementation, results, and lessons learned.
Related papers
- Deep classification algorithm for De-identification of DICOM medical images [0.0]
De-identification of DICOM files is an essential component of medical image research.<n>The most sensible information, like names, history, personal data and institution were successfully recognized.
arXiv Detail & Related papers (2025-08-04T08:21:18Z) - Medical Image De-Identification Resources: Synthetic DICOM Data and Tools for Validation [0.10617782943195009]
Ensuring patient privacy remains a significant challenge for open-access data sharing.<n>Digital Imaging and Communications in Medicine (DICOM) encodes both essential clinical metadata and extensive protected health information (PHI) and personally identifiable information (PII)<n>To address this gap, we developed an openly accessible DICOM dataset infused with synthetic PHI/PII and an evaluation framework for benchmarking image de-identification.
arXiv Detail & Related papers (2025-08-03T18:48:28Z) - DICOM De-Identification via Hybrid AI and Rule-Based Framework for Scalable, Uncertainty-Aware Redaction [0.0]
This paper presents a hybrid de-identification framework that combines rule-based and AI-driven techniques.<n>Our solution addresses critical challenges in medical data de-identification and supports the secure, ethical, and trustworthy release of imaging data for research.
arXiv Detail & Related papers (2025-07-31T17:19:38Z) - Exploring AI-based System Design for Pixel-level Protected Health Information Detection in Medical Images [0.5825410941577593]
We present an AI-based pipeline for PHI detection, comprising three key modules: text detection, text extraction, and text analysis.<n>Our findings indicate that the optimal setup involves utilizing dedicated vision and language models for each module.
arXiv Detail & Related papers (2025-01-16T14:12:33Z) - SIDA: Social Media Image Deepfake Detection, Localization and Explanation with Large Multimodal Model [48.547599530927926]
Synthetic images, when shared on social media, can mislead extensive audiences and erode trust in digital content.<n>We introduce the Social media Image Detection dataSet (SID-Set), which offers three key advantages.<n>We propose a new image deepfake detection, localization, and explanation framework, named SIDA.
arXiv Detail & Related papers (2024-12-05T16:12:25Z) - De-Identification of Medical Imaging Data: A Comprehensive Tool for Ensuring Patient Privacy [4.376648893167674]
Open-source tool can be used to de-identify DICOM magnetic resonance images, computer images, whole slide images and magnetic resonance twix raw data.
Proposal comprises an elaborate anonymization pipeline for multiple types of inputs, reducing the need for additional tools used for de-identification of imaging data.
arXiv Detail & Related papers (2024-10-16T09:31:24Z) - FEDMEKI: A Benchmark for Scaling Medical Foundation Models via Federated Knowledge Injection [83.54960238236548]
FEDMEKI not only preserves data privacy but also enhances the capability of medical foundation models.
FEDMEKI allows medical foundation models to learn from a broader spectrum of medical knowledge without direct data exposure.
arXiv Detail & Related papers (2024-08-17T15:18:56Z) - De-LightSAM: Modality-Decoupled Lightweight SAM for Generalizable Medical Segmentation [28.16884929151585]
We propose a modality-decoupled lightweight SAM for domain-generalized medical image segmentation.<n> Specifically, we first devise a lightweight domain-controllable image encoder (DC-Encoder) that produces discriminative visual features for diverse modalities.<n>Finally, we design the query-decoupled modality decoder (QM-Decoder) that leverages a one-to-one strategy to provide an independent decoding channel.
arXiv Detail & Related papers (2024-07-19T09:32:30Z) - TrialBench: Multi-Modal Artificial Intelligence-Ready Clinical Trial Datasets [54.98321887435557]
This paper presents a suite of 23 meticulously curated AI-ready datasets covering multi-modal input features and 8 crucial prediction challenges in clinical trial design.<n>We provide basic validation methods for each task to ensure the datasets' usability and reliability.<n>We anticipate that the availability of such open-access datasets will catalyze the development of advanced AI approaches for clinical trial design.
arXiv Detail & Related papers (2024-06-30T09:13:10Z) - QUBIQ: Uncertainty Quantification for Biomedical Image Segmentation Challenge [93.61262892578067]
Uncertainty in medical image segmentation tasks, especially inter-rater variability, presents a significant challenge.
This variability directly impacts the development and evaluation of automated segmentation algorithms.
We report the set-up and summarize the benchmark results of the Quantification of Uncertainties in Biomedical Image Quantification Challenge (QUBIQ)
arXiv Detail & Related papers (2024-03-19T17:57:24Z) - ConfounderGAN: Protecting Image Data Privacy with Causal Confounder [85.6757153033139]
We propose ConfounderGAN, a generative adversarial network (GAN) that can make personal image data unlearnable to protect the data privacy of its owners.
Experiments are conducted in six image classification datasets, consisting of three natural object datasets and three medical datasets.
arXiv Detail & Related papers (2022-12-04T08:49:14Z) - Gradient-Induced Co-Saliency Detection [81.54194063218216]
Co-saliency detection (Co-SOD) aims to segment the common salient foreground in a group of relevant images.
In this paper, inspired by human behavior, we propose a gradient-induced co-saliency detection method.
arXiv Detail & Related papers (2020-04-28T08:40:55Z) - Robust Medical Instrument Segmentation Challenge 2019 [56.148440125599905]
Intraoperative tracking of laparoscopic instruments is often a prerequisite for computer and robotic-assisted interventions.
Our challenge was based on a surgical data set comprising 10,040 annotated images acquired from a total of 30 surgical procedures.
The results confirm the initial hypothesis, namely that algorithm performance degrades with an increasing domain gap.
arXiv Detail & Related papers (2020-03-23T14:35:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.