Unlocking the Potential of Large Language Models for Clinical Text Anonymization: A Comparative Study
- URL: http://arxiv.org/abs/2406.00062v1
- Date: Wed, 29 May 2024 23:07:58 GMT
- Title: Unlocking the Potential of Large Language Models for Clinical Text Anonymization: A Comparative Study
- Authors: David Pissarra, Isabel Curioso, João Alveira, Duarte Pereira, Bruno Ribeiro, Tomás Souper, Vasco Gomes, André V. Carreiro, Vitor Rolla,
- Abstract summary: Automated clinical text anonymization has the potential to unlock the widespread sharing of textual health data for secondary usage.
Despite the proposal of many complex and theoretically successful anonymization solutions in literature, these techniques remain flawed.
Recent advances in developing Large Language Models (LLMs) pose a promising opportunity to further the field.
- Score: 4.1692340552627405
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Automated clinical text anonymization has the potential to unlock the widespread sharing of textual health data for secondary usage while assuring patient privacy and safety. Despite the proposal of many complex and theoretically successful anonymization solutions in literature, these techniques remain flawed. As such, clinical institutions are still reluctant to apply them for open access to their data. Recent advances in developing Large Language Models (LLMs) pose a promising opportunity to further the field, given their capability to perform various tasks. This paper proposes six new evaluation metrics tailored to the challenges of generative anonymization with LLMs. Moreover, we present a comparative study of LLM-based methods, testing them against two baseline techniques. Our results establish LLM-based models as a reliable alternative to common approaches, paving the way toward trustworthy anonymization of clinical text.
Related papers
- DIRI: Adversarial Patient Reidentification with Large Language Models for Evaluating Clinical Text Anonymization [13.038800602897354]
We develop an adversarial approach using a large language model to re-identify the patient corresponding to a redacted clinical note.
Our method uses a large language model to reidentify the patient corresponding to a redacted clinical note.
Although ClinicalBERT was the most effective, masking all identified PII, our tool still reidentified 9% of clinical notes.
arXiv Detail & Related papers (2024-10-22T14:06:31Z) - Clinical Insights: A Comprehensive Review of Language Models in Medicine [1.5020330976600738]
The study traces the evolution of LLMs from their foundational technologies to the latest developments in domain-specific models and multimodal integration.
The paper discusses both the opportunities these technologies present for enhancing clinical efficiency and the challenges they pose in terms of ethics, data privacy, and implementation.
arXiv Detail & Related papers (2024-08-21T15:59:33Z) - Robust Utility-Preserving Text Anonymization Based on Large Language Models [80.5266278002083]
Text anonymization is crucial for sharing sensitive data while maintaining privacy.
Existing techniques face the emerging challenges of re-identification attack ability of Large Language Models.
This paper proposes a framework composed of three LLM-based components -- a privacy evaluator, a utility evaluator, and an optimization component.
arXiv Detail & Related papers (2024-07-16T14:28:56Z) - Large Language Models are Advanced Anonymizers [13.900633576526863]
We show how adversarial anonymization outperforms current industry-grade anonymizers in terms of the resulting utility and privacy.
We first present a new setting for evaluating anonymizations in the face of adversarial LLMs inferences.
arXiv Detail & Related papers (2024-02-21T14:44:00Z) - Large Language Model Distilling Medication Recommendation Model [61.89754499292561]
We harness the powerful semantic comprehension and input-agnostic characteristics of Large Language Models (LLMs)
Our research aims to transform existing medication recommendation methodologies using LLMs.
To mitigate this, we have developed a feature-level knowledge distillation technique, which transfers the LLM's proficiency to a more compact model.
arXiv Detail & Related papers (2024-02-05T08:25:22Z) - Distilling Large Language Models for Matching Patients to Clinical
Trials [3.4068841624198942]
The recent success of large language models (LLMs) has paved the way for their adoption in the high-stakes domain of healthcare.
This study presents the first systematic examination of the efficacy of both proprietary (GPT-3.5, and GPT-4) and open-source LLMs (LLAMA 7B,13B, and 70B) for the task of patient-trial matching.
Our findings reveal that open-source LLMs, when fine-tuned on this limited and synthetic dataset, demonstrate performance parity with their proprietary counterparts.
arXiv Detail & Related papers (2023-12-15T17:11:07Z) - Knowledge-Infused Prompting: Assessing and Advancing Clinical Text Data
Generation with Large Language Models [48.07083163501746]
Clinical natural language processing requires methods that can address domain-specific challenges.
We propose an innovative, resource-efficient approach, ClinGen, which infuses knowledge into the process.
Our empirical study across 7 clinical NLP tasks and 16 datasets reveals that ClinGen consistently enhances performance across various tasks.
arXiv Detail & Related papers (2023-11-01T04:37:28Z) - Redefining Digital Health Interfaces with Large Language Models [69.02059202720073]
Large Language Models (LLMs) have emerged as general-purpose models with the ability to process complex information.
We show how LLMs can provide a novel interface between clinicians and digital technologies.
We develop a new prognostic tool using automated machine learning.
arXiv Detail & Related papers (2023-10-05T14:18:40Z) - Self-Verification Improves Few-Shot Clinical Information Extraction [73.6905567014859]
Large language models (LLMs) have shown the potential to accelerate clinical curation via few-shot in-context learning.
They still struggle with issues regarding accuracy and interpretability, especially in mission-critical domains such as health.
Here, we explore a general mitigation framework using self-verification, which leverages the LLM to provide provenance for its own extraction and check its own outputs.
arXiv Detail & Related papers (2023-05-30T22:05:11Z) - Large Language Models for Healthcare Data Augmentation: An Example on
Patient-Trial Matching [49.78442796596806]
We propose an innovative privacy-aware data augmentation approach for patient-trial matching (LLM-PTM)
Our experiments demonstrate a 7.32% average improvement in performance using the proposed LLM-PTM method, and the generalizability to new data is improved by 12.12%.
arXiv Detail & Related papers (2023-03-24T03:14:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.