Exploring AI-based System Design for Pixel-level Protected Health Information Detection in Medical Images
- URL: http://arxiv.org/abs/2501.09552v4
- Date: Tue, 24 Jun 2025 19:25:40 GMT
- Title: Exploring AI-based System Design for Pixel-level Protected Health Information Detection in Medical Images
- Authors: Tuan Truong, Ivo M. Baltruschat, Mark Klemens, Grit Werner, Matthias Lenga,
- Abstract summary: We present an AI-based pipeline for PHI detection, comprising three key modules: text detection, text extraction, and text analysis.<n>Our findings indicate that the optimal setup involves utilizing dedicated vision and language models for each module.
- Score: 0.5825410941577593
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: De-identification of medical images is a critical step to ensure privacy during data sharing in research and clinical settings. The initial step in this process involves detecting Protected Health Information (PHI), which can be found in image metadata or imprinted within image pixels. Despite the importance of such systems, there has been limited evaluation of existing AI-based solutions, creating barriers to the development of reliable and robust tools. In this study, we present an AI-based pipeline for PHI detection, comprising three key modules: text detection, text extraction, and text analysis. We benchmark three models - YOLOv11, EasyOCR, and GPT-4o - across different setups corresponding to these modules, evaluating their performance on two different datasets encompassing multiple imaging modalities and PHI categories. Our findings indicate that the optimal setup involves utilizing dedicated vision and language models for each module, which achieves a commendable balance in performance, latency, and cost associated with the usage of Large Language Models (LLMs). Additionally, we show that the application of LLMs not only involves identifying PHI content but also enhances OCR tasks and facilitates an end-to-end PHI detection pipeline, showcasing promising outcomes through our analysis.
Related papers
- Purifying, Labeling, and Utilizing: A High-Quality Pipeline for Small Object Detection [83.90563802153707]
PLUSNet is a high-quality Small object detection framework.
It comprises three components: the Hierarchical Feature (HFP) framework for purifying upstream features, the Multiple Criteria Label Assignment (MCLA) for improving the quality of midstream training samples, and the Frequency Decoupled Head (FDHead) for more effectively exploiting information to accomplish downstream tasks.
arXiv Detail & Related papers (2025-04-29T10:11:03Z) - Zeus: Zero-shot LLM Instruction for Union Segmentation in Multimodal Medical Imaging [4.341503087761129]
Conducting multimodal learning involves visual and text modalities shown as a solution, but collecting paired vision-language datasets is expensive and time-consuming.<n>Inspired by the superior ability in numerous cross-modal tasks for Large Language Models (LLMs), we proposed a novel Vision-LLM union framework to address the issues.
arXiv Detail & Related papers (2025-04-09T23:33:35Z) - M3-AGIQA: Multimodal, Multi-Round, Multi-Aspect AI-Generated Image Quality Assessment [65.3860007085689]
M3-AGIQA is a comprehensive framework that enables more human-aligned, holistic evaluation of AI-generated images.<n>By aligning model outputs more closely with human judgment, M3-AGIQA delivers robust and interpretable quality scores.
arXiv Detail & Related papers (2025-02-21T03:05:45Z) - Fake It Till You Make It: Using Synthetic Data and Domain Knowledge for Improved Text-Based Learning for LGE Detection [11.532639713283226]
We use strategies rooted in domain knowledge to train a model for LGE detection using text from clinical reports.<n>We standardize the orientation of the images in an anatomy-informed way to enable better alignment of spatial and text features.<n> ablation studies are carried out to elucidate the contributions of each design component to the overall performance of the model.
arXiv Detail & Related papers (2025-02-18T15:30:48Z) - Privacy-Preserving in Medical Image Analysis: A Review of Methods and Applications [19.14185066631612]
Review offers a comprehensive overview of privacy-preserving techniques in medical image analysis.<n>Includes encryption, differential privacy, homomorphic encryption, federated learning, and generative adversarial networks.<n>We explore the application of these techniques across various medical image analysis tasks, such as diagnosis, pathology, and telemedicine.
arXiv Detail & Related papers (2024-12-05T06:56:06Z) - A Multimodal Approach Combining Structural and Cross-domain Textual Guidance for Weakly Supervised OCT Segmentation [12.948027961485536]
We propose a novel Weakly Supervised Semantic (WSSS) approach that integrates structural guidance with text-driven strategies to generate high-quality pseudo labels.
Our method achieves state-of-the-art performance, highlighting its potential to improve diagnostic accuracy and efficiency in medical imaging.
arXiv Detail & Related papers (2024-11-19T16:20:27Z) - Clinical Evaluation of Medical Image Synthesis: A Case Study in Wireless Capsule Endoscopy [63.39037092484374]
This study focuses on the clinical evaluation of medical Synthetic Data Generation using Artificial Intelligence (AI) models.
The paper contributes by a) presenting a protocol for the systematic evaluation of synthetic images by medical experts and b) applying it to assess TIDE-II, a novel variational autoencoder-based model for high-resolution WCE image synthesis.
The results show that TIDE-II generates clinically relevant WCE images, helping to address data scarcity and enhance diagnostic tools.
arXiv Detail & Related papers (2024-10-31T19:48:50Z) - MIST: A Simple and Scalable End-To-End 3D Medical Imaging Segmentation Framework [1.4043931310479378]
The Medical Imaging Toolkit (MIST) is designed to facilitate consistent training, testing, and evaluation of deep learning-based medical imaging segmentation methods.
MIST standardizes data analysis, preprocessing, and evaluation pipelines, accommodating multiple architectures and loss functions.
arXiv Detail & Related papers (2024-07-31T05:17:31Z) - An Early Investigation into the Utility of Multimodal Large Language Models in Medical Imaging [0.3029213689620348]
We explore the potential of the Gemini (textitgemini-1.0-pro-vision-latest) and GPT-4V models for medical image analysis.
Both Gemini AI and GPT-4V are first used to classify real versus synthetic images, followed by an interpretation and analysis of the input images.
Our early investigation presented in this work provides insights into the potential of MLLMs to assist with the classification and interpretation of retinal fundoscopy and lung X-ray images.
arXiv Detail & Related papers (2024-06-02T08:29:23Z) - DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection [111.68263493302499]
We introduce DetCLIPv3, a high-performing detector that excels at both open-vocabulary object detection and hierarchical labels.
DetCLIPv3 is characterized by three core designs: 1) Versatile model architecture; 2) High information density data; and 3) Efficient training strategy.
DetCLIPv3 demonstrates superior open-vocabulary detection performance, outperforming GLIPv2, GroundingDINO, and DetCLIPv2 by 18.0/19.6/6.6 AP, respectively.
arXiv Detail & Related papers (2024-04-14T11:01:44Z) - Plaintext-Free Deep Learning for Privacy-Preserving Medical Image Analysis via Frequency Information Embedding [9.192156293063414]
This paper proposes a novel framework that uses surrogate images for analysis.
The framework is called Frequency-domain Exchange Style Fusion (FESF)
Our framework effectively preserves the privacy of medical images and maintains diagnostic accuracy of DL models at a relatively high level, proving its effectiveness across various datasets and DL-based models.
arXiv Detail & Related papers (2024-03-25T06:56:38Z) - Medical Image Data Provenance for Medical Cyber-Physical System [8.554664822046966]
This study proposes using watermarking techniques to embed a device fingerprint (DFP) into captured images.
The DFP, representing the unique attributes of the capturing device and raw image, is embedded into raw images before storage.
A robust remote validation method is introduced to authenticate images, enhancing the integrity of medical image data in interconnected healthcare systems.
arXiv Detail & Related papers (2024-03-22T13:24:44Z) - Enhancing Weakly Supervised 3D Medical Image Segmentation through
Probabilistic-aware Learning [52.249748801637196]
3D medical image segmentation is a challenging task with crucial implications for disease diagnosis and treatment planning.
Recent advances in deep learning have significantly enhanced fully supervised medical image segmentation.
We propose a novel probabilistic-aware weakly supervised learning pipeline, specifically designed for 3D medical imaging.
arXiv Detail & Related papers (2024-03-05T00:46:53Z) - RAD-DINO: Exploring Scalable Medical Image Encoders Beyond Text Supervision [43.05373341291021]
Language-supervised pre-training has proven to be a valuable method for extracting semantically meaningful features from images.<n>We introduce RAD-DINO, a biomedical image encoder pre-trained solely on unimodal biomedical imaging data.
arXiv Detail & Related papers (2024-01-19T17:02:17Z) - DeepArt: A Benchmark to Advance Fidelity Research in AI-Generated
Content [9.482738088610535]
This paper explores the image synthesis capabilities of GPT-4, a leading multi-modal large language model.
We establish a benchmark for evaluating the fidelity of texture features in images generated by GPT-4, comprising manually painted pictures and their AI-generated counterparts.
We have compiled a unique benchmark of manual drawings and corresponding GPT-4-generated images, introducing a new task to advance fidelity research in AI-generated content.
arXiv Detail & Related papers (2023-12-16T10:17:09Z) - Raising the Bar of AI-generated Image Detection with CLIP [50.345365081177555]
The aim of this work is to explore the potential of pre-trained vision-language models (VLMs) for universal detection of AI-generated images.
We develop a lightweight detection strategy based on CLIP features and study its performance in a wide variety of challenging scenarios.
arXiv Detail & Related papers (2023-11-30T21:11:20Z) - Clairvoyance: A Pipeline Toolkit for Medical Time Series [95.22483029602921]
Time-series learning is the bread and butter of data-driven *clinical decision support*
Clairvoyance proposes a unified, end-to-end, autoML-friendly pipeline that serves as a software toolkit.
Clairvoyance is the first to demonstrate viability of a comprehensive and automatable pipeline for clinical time-series ML.
arXiv Detail & Related papers (2023-10-28T12:08:03Z) - Disruptive Autoencoders: Leveraging Low-level features for 3D Medical
Image Pre-training [51.16994853817024]
This work focuses on designing an effective pre-training framework for 3D radiology images.
We introduce Disruptive Autoencoders, a pre-training framework that attempts to reconstruct the original image from disruptions created by a combination of local masking and low-level perturbations.
The proposed pre-training framework is tested across multiple downstream tasks and achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-07-31T17:59:42Z) - Transfer Learning for the Efficient Detection of COVID-19 from
Smartphone Audio Data [6.18778092044887]
Disease detection from smartphone data represents an open research challenge in mobile health (m-health) systems.
We present the experimental evaluation of 3 different deep learning models, compared also with hand-crafted features.
We evaluate the memory footprints of the different models for their possible applications on commercial mobile devices.
arXiv Detail & Related papers (2023-07-06T13:19:27Z) - On the Possibilities of AI-Generated Text Detection [76.55825911221434]
We argue that as machine-generated text approximates human-like quality, the sample size needed for detection bounds increases.
We test various state-of-the-art text generators, including GPT-2, GPT-3.5-Turbo, Llama, Llama-2-13B-Chat-HF, and Llama-2-70B-Chat-HF, against detectors, including oBERTa-Large/Base-Detector, GPTZero.
arXiv Detail & Related papers (2023-04-10T17:47:39Z) - Vision-Language Modelling For Radiological Imaging and Reports In The
Low Data Regime [70.04389979779195]
This paper explores training medical vision-language models (VLMs) where the visual and language inputs are embedded into a common space.
We explore several candidate methods to improve low-data performance, including adapting generic pre-trained models to novel image and text domains.
Using text-to-image retrieval as a benchmark, we evaluate the performance of these methods with variable sized training datasets of paired chest X-rays and radiological reports.
arXiv Detail & Related papers (2023-03-30T18:20:00Z) - PCRLv2: A Unified Visual Information Preservation Framework for
Self-supervised Pre-training in Medical Image Analysis [56.63327669853693]
We propose to incorporate the task of pixel restoration for explicitly encoding more pixel-level information into high-level semantics.
We also address the preservation of scale information, a powerful tool in aiding image understanding.
The proposed unified SSL framework surpasses its self-supervised counterparts on various tasks.
arXiv Detail & Related papers (2023-01-02T17:47:27Z) - PACMAN: a framework for pulse oximeter digit detection and reading in a
low-resource setting [0.42897826548373363]
In light of the COVID-19 pandemic, patients were required to manually input their daily oxygen saturation (SpO2) and pulse rate (PR) values into a health monitoring system.
Several studies attempted to detect the physiological value from the captured image using optical character recognition (OCR)
This study aimed to propose a novel framework called PACMAN with a low-resource deep learning-based computer vision.
arXiv Detail & Related papers (2022-12-09T16:22:28Z) - A Trustworthy Framework for Medical Image Analysis with Deep Learning [71.48204494889505]
TRUDLMIA is a trustworthy deep learning framework for medical image analysis.
It is anticipated that the framework will support researchers and clinicians in advancing the use of deep learning for dealing with public health crises including COVID-19.
arXiv Detail & Related papers (2022-12-06T05:30:22Z) - Align, Reason and Learn: Enhancing Medical Vision-and-Language
Pre-training with Knowledge [68.90835997085557]
We propose a systematic and effective approach to enhance structured medical knowledge from three perspectives.
First, we align the representations of the vision encoder and the language encoder through knowledge.
Second, we inject knowledge into the multi-modal fusion model to enable the model to perform reasoning using knowledge as the supplementation of the input image and text.
Third, we guide the model to put emphasis on the most critical information in images and texts by designing knowledge-induced pretext tasks.
arXiv Detail & Related papers (2022-09-15T08:00:01Z) - Ensemble of CNN classifiers using Sugeno Fuzzy Integral Technique for
Cervical Cytology Image Classification [1.6986898305640261]
We propose a fully automated computer-aided diagnosis tool for classifying single-cell and slide images of cervical cancer.
We use the Sugeno Fuzzy Integral to ensemble the decision scores from three popular deep learning models, namely, Inception v3, DenseNet-161 and ResNet-34.
arXiv Detail & Related papers (2021-08-21T08:41:41Z) - Privacy-preserving medical image analysis [53.4844489668116]
We present PriMIA, a software framework designed for privacy-preserving machine learning (PPML) in medical imaging.
We show significantly better classification performance of a securely aggregated federated learning model compared to human experts on unseen datasets.
We empirically evaluate the framework's security against a gradient-based model inversion attack.
arXiv Detail & Related papers (2020-12-10T13:56:00Z) - Explaining Clinical Decision Support Systems in Medical Imaging using
Cycle-Consistent Activation Maximization [112.2628296775395]
Clinical decision support using deep neural networks has become a topic of steadily growing interest.
clinicians are often hesitant to adopt the technology because its underlying decision-making process is considered to be intransparent and difficult to comprehend.
We propose a novel decision explanation scheme based on CycleGAN activation which generates high-quality visualizations of classifier decisions even in smaller data sets.
arXiv Detail & Related papers (2020-10-09T14:39:27Z) - Towards High Performance Human Keypoint Detection [87.1034745775229]
We find that context information plays an important role in reasoning human body configuration and invisible keypoints.
Inspired by this, we propose a cascaded context mixer ( CCM) which efficiently integrates spatial and channel context information.
To maximize CCM's representation capability, we develop a hard-negative person detection mining strategy and a joint-training strategy.
We present several sub-pixel refinement techniques for postprocessing keypoint predictions to improve detection accuracy.
arXiv Detail & Related papers (2020-02-03T02:24:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.