Large Language Model Empowered Privacy-Protected Framework for PHI Annotation in Clinical Notes
- URL: http://arxiv.org/abs/2504.18569v1
- Date: Tue, 22 Apr 2025 03:18:36 GMT
- Title: Large Language Model Empowered Privacy-Protected Framework for PHI Annotation in Clinical Notes
- Authors: Guanchen Wu, Linzhi Zheng, Han Xie, Zhen Xiang, Jiaying Lu, Darren Liu, Delgersuren Bold, Bo Li, Xiao Hu, Carl Yang,
- Abstract summary: We introduce LPPA, an LLM-powered Privacy-Protected PHI framework for clinical notes.<n>This framework ensures privacy protection and high PHI annotation accuracy.<n>Extensive experiments demonstrate LPPA's effectiveness in accurately de-identifying private information.
- Score: 28.117944094827596
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The de-identification of private information in medical data is a crucial process to mitigate the risk of confidentiality breaches, particularly when patient personal details are not adequately removed before the release of medical records. Although rule-based and learning-based methods have been proposed, they often struggle with limited generalizability and require substantial amounts of annotated data for effective performance. Recent advancements in large language models (LLMs) have shown significant promise in addressing these issues due to their superior language comprehension capabilities. However, LLMs present challenges, including potential privacy risks when using commercial LLM APIs and high computational costs for deploying open-source LLMs locally. In this work, we introduce LPPA, an LLM-empowered Privacy-Protected PHI Annotation framework for clinical notes, targeting the English language. By fine-tuning LLMs locally with synthetic notes, LPPA ensures strong privacy protection and high PHI annotation accuracy. Extensive experiments demonstrate LPPA's effectiveness in accurately de-identifying private information, offering a scalable and efficient solution for enhancing patient privacy protection.
Related papers
- Differentially Private Steering for Large Language Model Alignment [55.30573701583768]
We present the first study of aligning Large Language Models with private datasets.<n>Our work proposes the Private Steering for LLM Alignment (PSA) algorithm to edit activations with differential privacy guarantees.<n>Our results show that PSA achieves DP guarantees for LLM alignment with minimal loss in performance.
arXiv Detail & Related papers (2025-01-30T17:58:36Z) - PrivacyLens: Evaluating Privacy Norm Awareness of Language Models in Action [54.11479432110771]
PrivacyLens is a novel framework designed to extend privacy-sensitive seeds into expressive vignettes and further into agent trajectories.
We instantiate PrivacyLens with a collection of privacy norms grounded in privacy literature and crowdsourced seeds.
State-of-the-art LMs, like GPT-4 and Llama-3-70B, leak sensitive information in 25.68% and 38.69% of cases, even when prompted with privacy-enhancing instructions.
arXiv Detail & Related papers (2024-08-29T17:58:38Z) - LLM-PBE: Assessing Data Privacy in Large Language Models [111.58198436835036]
Large Language Models (LLMs) have become integral to numerous domains, significantly advancing applications in data management, mining, and analysis.
Despite the critical nature of this issue, there has been no existing literature to offer a comprehensive assessment of data privacy risks in LLMs.
Our paper introduces LLM-PBE, a toolkit crafted specifically for the systematic evaluation of data privacy risks in LLMs.
arXiv Detail & Related papers (2024-08-23T01:37:29Z) - Robust Utility-Preserving Text Anonymization Based on Large Language Models [80.5266278002083]
Text anonymization is crucial for sharing sensitive data while maintaining privacy.
Existing techniques face the emerging challenges of re-identification attack ability of Large Language Models.
This paper proposes a framework composed of three LLM-based components -- a privacy evaluator, a utility evaluator, and an optimization component.
arXiv Detail & Related papers (2024-07-16T14:28:56Z) - Enhancing Data Privacy in Large Language Models through Private Association Editing [1.078439500019266]
Large language models (LLMs) require a significant redesign in solutions to preserve privacy in data-intensive applications.
This paper introduces Private Association Editing (PAE) as a novel defense approach for private data leakage.
arXiv Detail & Related papers (2024-06-26T10:08:47Z) - No Free Lunch Theorem for Privacy-Preserving LLM Inference [30.554456047738295]
This study develops a framework for inferring privacy-protected Large Language Models (LLMs)<n>It lays down a solid theoretical basis for examining the interplay between privacy preservation and utility.
arXiv Detail & Related papers (2024-05-31T08:22:53Z) - Locally Differentially Private In-Context Learning [8.659575019965152]
Large pretrained language models (LLMs) have shown surprising In-Context Learning (ICL) ability.
This paper proposes a locally differentially private framework of in-context learning (LDP-ICL)
Considering the mechanisms of in-context learning in Transformers by gradient descent, we provide an analysis of the trade-off between privacy and utility in such LDP-ICL.
arXiv Detail & Related papers (2024-05-07T06:05:43Z) - On Protecting the Data Privacy of Large Language Models (LLMs): A Survey [35.48984524483533]
Large language models (LLMs) are complex artificial intelligence systems capable of understanding, generating and translating human language.
LLMs process and generate large amounts of data, which may threaten data privacy.
arXiv Detail & Related papers (2024-03-08T08:47:48Z) - Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Contextual Integrity Theory [82.7042006247124]
We show that even the most capable AI models reveal private information in contexts that humans would not, 39% and 57% of the time, respectively.
Our work underscores the immediate need to explore novel inference-time privacy-preserving approaches, based on reasoning and theory of mind.
arXiv Detail & Related papers (2023-10-27T04:15:30Z) - PrivacyMind: Large Language Models Can Be Contextual Privacy Protection Learners [81.571305826793]
We introduce Contextual Privacy Protection Language Models (PrivacyMind)
Our work offers a theoretical analysis for model design and benchmarks various techniques.
In particular, instruction tuning with both positive and negative examples stands out as a promising method.
arXiv Detail & Related papers (2023-10-03T22:37:01Z) - Enhancing Small Medical Learners with Privacy-preserving Contextual Prompting [24.201549275369487]
We present a method that harnesses large language models' medical expertise to boost SLM performance in medical tasks under privacy-restricted scenarios.
Specifically, we mitigate patient privacy issues by extracting keywords from medical data and prompting the LLM to generate a medical knowledge-intensive context.
Our method significantly enhances performance in both few-shot and full training settings across three medical knowledge-intensive tasks.
arXiv Detail & Related papers (2023-05-22T05:14:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.