Related papers: Exploring AI-based System Design for Pixel-level Protected Health Information Detection in Medical Images

Exploring AI-based System Design for Pixel-level Protected Health Information Detection in Medical Images

URL: http://arxiv.org/abs/2501.09552v3
Date: Tue, 29 Apr 2025 12:35:25 GMT
Title: Exploring AI-based System Design for Pixel-level Protected Health Information Detection in Medical Images
Authors: Tuan Truong, Ivo M. Baltruschat, Mark Klemens, Grit Werner, Matthias Lenga,
Abstract summary: We present an AI-based pipeline for PHI detection comprising text detection, text extraction, and text analysis.<n>We benchmark three models, YOLOv11, EasyOCR, and GPT-4o, across different setups corresponding to these components.<n>The combination of YOLOv11 for text localization and GPT-4o for extraction and analysis yields the best results.
Score: 0.5825410941577593
License: http://creativecommons.org/licenses/by/4.0/
Abstract: De-identification of medical images is a critical step to ensure privacy during data sharing in research and clinical settings. The initial step in this process involves detecting Protected Health Information (PHI), which can be found in image metadata or imprinted within image pixels. Despite the importance of such systems, there has been limited evaluation of existing AI-based solutions, creating barriers to the development of reliable and robust tools. In this study, we present an AI-based pipeline for PHI detection, comprising three key components: text detection, text extraction, and text analysis. We benchmark three models, YOLOv11, EasyOCR, and GPT-4o, across different setups corresponding to these components, evaluating the performance based on precision, recall, F1 score, and accuracy. All setups demonstrate excellent PHI detection, with all metrics exceeding 0.9. The combination of YOLOv11 for text localization and GPT-4o for extraction and analysis yields the best results. However, this setup incurs higher costs due to GPT-4o's token generation. Conversely, an end-to-end pipeline that relies solely on GPT-4o shows lower performance but highlights the potential of multimodal models for complex tasks. We recommend fine-tuning a dedicated object detection model and utilizing built-in OCR tools to achieve optimal performance and cost-effectiveness. Additionally, leveraging language models such as GPT-4o can facilitate thorough and flexible analysis of text content.

Related papers

Purifying, Labeling, and Utilizing: A High-Quality Pipeline for Small Object Detection [83.90563802153707]
PLUSNet is a high-quality Small object detection framework. It comprises three components: the Hierarchical Feature (HFP) framework for purifying upstream features, the Multiple Criteria Label Assignment (MCLA) for improving the quality of midstream training samples, and the Frequency Decoupled Head (FDHead) for more effectively exploiting information to accomplish downstream tasks.
arXiv Detail & Related papers (2025-04-29T10:11:03Z)
Privacy-Preserving in Medical Image Analysis: A Review of Methods and Applications [19.14185066631612]
Review offers a comprehensive overview of privacy-preserving techniques in medical image analysis.<n>Includes encryption, differential privacy, homomorphic encryption, federated learning, and generative adversarial networks.<n>We explore the application of these techniques across various medical image analysis tasks, such as diagnosis, pathology, and telemedicine.
arXiv Detail & Related papers (2024-12-05T06:56:06Z)
Clinical Evaluation of Medical Image Synthesis: A Case Study in Wireless Capsule Endoscopy [63.39037092484374]
This study focuses on the clinical evaluation of medical Synthetic Data Generation using Artificial Intelligence (AI) models. The paper contributes by a) presenting a protocol for the systematic evaluation of synthetic images by medical experts and b) applying it to assess TIDE-II, a novel variational autoencoder-based model for high-resolution WCE image synthesis. The results show that TIDE-II generates clinically relevant WCE images, helping to address data scarcity and enhance diagnostic tools.
arXiv Detail & Related papers (2024-10-31T19:48:50Z)
MIST: A Simple and Scalable End-To-End 3D Medical Imaging Segmentation Framework [1.4043931310479378]
The Medical Imaging Toolkit (MIST) is designed to facilitate consistent training, testing, and evaluation of deep learning-based medical imaging segmentation methods. MIST standardizes data analysis, preprocessing, and evaluation pipelines, accommodating multiple architectures and loss functions.
arXiv Detail & Related papers (2024-07-31T05:17:31Z)
An Early Investigation into the Utility of Multimodal Large Language Models in Medical Imaging [0.3029213689620348]
We explore the potential of the Gemini (textitgemini-1.0-pro-vision-latest) and GPT-4V models for medical image analysis. Both Gemini AI and GPT-4V are first used to classify real versus synthetic images, followed by an interpretation and analysis of the input images. Our early investigation presented in this work provides insights into the potential of MLLMs to assist with the classification and interpretation of retinal fundoscopy and lung X-ray images.
arXiv Detail & Related papers (2024-06-02T08:29:23Z)
DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection [111.68263493302499]
We introduce DetCLIPv3, a high-performing detector that excels at both open-vocabulary object detection and hierarchical labels. DetCLIPv3 is characterized by three core designs: 1) Versatile model architecture; 2) High information density data; and 3) Efficient training strategy. DetCLIPv3 demonstrates superior open-vocabulary detection performance, outperforming GLIPv2, GroundingDINO, and DetCLIPv2 by 18.0/19.6/6.6 AP, respectively.
arXiv Detail & Related papers (2024-04-14T11:01:44Z)
Medical Image Data Provenance for Medical Cyber-Physical System [8.554664822046966]
This study proposes using watermarking techniques to embed a device fingerprint (DFP) into captured images. The DFP, representing the unique attributes of the capturing device and raw image, is embedded into raw images before storage. A robust remote validation method is introduced to authenticate images, enhancing the integrity of medical image data in interconnected healthcare systems.
arXiv Detail & Related papers (2024-03-22T13:24:44Z)
Enhancing Weakly Supervised 3D Medical Image Segmentation through Probabilistic-aware Learning [52.249748801637196]
3D medical image segmentation is a challenging task with crucial implications for disease diagnosis and treatment planning. Recent advances in deep learning have significantly enhanced fully supervised medical image segmentation. We propose a novel probabilistic-aware weakly supervised learning pipeline, specifically designed for 3D medical imaging.
arXiv Detail & Related papers (2024-03-05T00:46:53Z)
RAD-DINO: Exploring Scalable Medical Image Encoders Beyond Text Supervision [43.05373341291021]
Language-supervised pre-training has proven to be a valuable method for extracting semantically meaningful features from images.<n>We introduce RAD-DINO, a biomedical image encoder pre-trained solely on unimodal biomedical imaging data.
arXiv Detail & Related papers (2024-01-19T17:02:17Z)
DeepArt: A Benchmark to Advance Fidelity Research in AI-Generated Content [9.482738088610535]
This paper explores the image synthesis capabilities of GPT-4, a leading multi-modal large language model. We establish a benchmark for evaluating the fidelity of texture features in images generated by GPT-4, comprising manually painted pictures and their AI-generated counterparts. We have compiled a unique benchmark of manual drawings and corresponding GPT-4-generated images, introducing a new task to advance fidelity research in AI-generated content.
arXiv Detail & Related papers (2023-12-16T10:17:09Z)
Raising the Bar of AI-generated Image Detection with CLIP [50.345365081177555]
The aim of this work is to explore the potential of pre-trained vision-language models (VLMs) for universal detection of AI-generated images. We develop a lightweight detection strategy based on CLIP features and study its performance in a wide variety of challenging scenarios.
arXiv Detail & Related papers (2023-11-30T21:11:20Z)
Disruptive Autoencoders: Leveraging Low-level features for 3D Medical Image Pre-training [51.16994853817024]
This work focuses on designing an effective pre-training framework for 3D radiology images. We introduce Disruptive Autoencoders, a pre-training framework that attempts to reconstruct the original image from disruptions created by a combination of local masking and low-level perturbations. The proposed pre-training framework is tested across multiple downstream tasks and achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-07-31T17:59:42Z)
Transfer Learning for the Efficient Detection of COVID-19 from Smartphone Audio Data [6.18778092044887]
Disease detection from smartphone data represents an open research challenge in mobile health (m-health) systems. We present the experimental evaluation of 3 different deep learning models, compared also with hand-crafted features. We evaluate the memory footprints of the different models for their possible applications on commercial mobile devices.
arXiv Detail & Related papers (2023-07-06T13:19:27Z)
On the Possibilities of AI-Generated Text Detection [76.55825911221434]
We argue that as machine-generated text approximates human-like quality, the sample size needed for detection bounds increases. We test various state-of-the-art text generators, including GPT-2, GPT-3.5-Turbo, Llama, Llama-2-13B-Chat-HF, and Llama-2-70B-Chat-HF, against detectors, including oBERTa-Large/Base-Detector, GPTZero.
arXiv Detail & Related papers (2023-04-10T17:47:39Z)
PACMAN: a framework for pulse oximeter digit detection and reading in a low-resource setting [0.42897826548373363]
In light of the COVID-19 pandemic, patients were required to manually input their daily oxygen saturation (SpO2) and pulse rate (PR) values into a health monitoring system. Several studies attempted to detect the physiological value from the captured image using optical character recognition (OCR) This study aimed to propose a novel framework called PACMAN with a low-resource deep learning-based computer vision.
arXiv Detail & Related papers (2022-12-09T16:22:28Z)
A Trustworthy Framework for Medical Image Analysis with Deep Learning [71.48204494889505]
TRUDLMIA is a trustworthy deep learning framework for medical image analysis. It is anticipated that the framework will support researchers and clinicians in advancing the use of deep learning for dealing with public health crises including COVID-19.
arXiv Detail & Related papers (2022-12-06T05:30:22Z)
Align, Reason and Learn: Enhancing Medical Vision-and-Language Pre-training with Knowledge [68.90835997085557]
We propose a systematic and effective approach to enhance structured medical knowledge from three perspectives. First, we align the representations of the vision encoder and the language encoder through knowledge. Second, we inject knowledge into the multi-modal fusion model to enable the model to perform reasoning using knowledge as the supplementation of the input image and text. Third, we guide the model to put emphasis on the most critical information in images and texts by designing knowledge-induced pretext tasks.
arXiv Detail & Related papers (2022-09-15T08:00:01Z)
Ensemble of CNN classifiers using Sugeno Fuzzy Integral Technique for Cervical Cytology Image Classification [1.6986898305640261]
We propose a fully automated computer-aided diagnosis tool for classifying single-cell and slide images of cervical cancer. We use the Sugeno Fuzzy Integral to ensemble the decision scores from three popular deep learning models, namely, Inception v3, DenseNet-161 and ResNet-34.
arXiv Detail & Related papers (2021-08-21T08:41:41Z)
Privacy-preserving medical image analysis [53.4844489668116]
We present PriMIA, a software framework designed for privacy-preserving machine learning (PPML) in medical imaging. We show significantly better classification performance of a securely aggregated federated learning model compared to human experts on unseen datasets. We empirically evaluate the framework's security against a gradient-based model inversion attack.
arXiv Detail & Related papers (2020-12-10T13:56:00Z)
Explaining Clinical Decision Support Systems in Medical Imaging using Cycle-Consistent Activation Maximization [112.2628296775395]
Clinical decision support using deep neural networks has become a topic of steadily growing interest. clinicians are often hesitant to adopt the technology because its underlying decision-making process is considered to be intransparent and difficult to comprehend. We propose a novel decision explanation scheme based on CycleGAN activation which generates high-quality visualizations of classifier decisions even in smaller data sets.
arXiv Detail & Related papers (2020-10-09T14:39:27Z)
Towards High Performance Human Keypoint Detection [87.1034745775229]
We find that context information plays an important role in reasoning human body configuration and invisible keypoints. Inspired by this, we propose a cascaded context mixer ( CCM) which efficiently integrates spatial and channel context information. To maximize CCM's representation capability, we develop a hard-negative person detection mining strategy and a joint-training strategy. We present several sub-pixel refinement techniques for postprocessing keypoint predictions to improve detection accuracy.
arXiv Detail & Related papers (2020-02-03T02:24:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.