Related papers: GoldCoin: Grounding Large Language Models in Privacy Laws via Contextual Integrity Theory

GoldCoin: Grounding Large Language Models in Privacy Laws via Contextual Integrity Theory

URL: http://arxiv.org/abs/2406.11149v2
Date: Fri, 04 Oct 2024 03:00:43 GMT
Title: GoldCoin: Grounding Large Language Models in Privacy Laws via Contextual Integrity Theory
Authors: Wei Fan, Haoran Li, Zheye Deng, Weiqi Wang, Yangqiu Song,
Abstract summary: Existing research studies privacy by exploring various privacy attacks, defenses, and evaluations within narrowly predefined patterns. We introduce a novel framework, GoldCoin, designed to efficiently ground LLMs in privacy laws for judicial assessing privacy violations. Our framework leverages the theory of contextual integrity as a bridge, creating numerous synthetic scenarios grounded in relevant privacy statutes.
Score: 44.297102658873726
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Privacy issues arise prominently during the inappropriate transmission of information between entities. Existing research primarily studies privacy by exploring various privacy attacks, defenses, and evaluations within narrowly predefined patterns, while neglecting that privacy is not an isolated, context-free concept limited to traditionally sensitive data (e.g., social security numbers), but intertwined with intricate social contexts that complicate the identification and analysis of potential privacy violations. The advent of Large Language Models (LLMs) offers unprecedented opportunities for incorporating the nuanced scenarios outlined in privacy laws to tackle these complex privacy issues. However, the scarcity of open-source relevant case studies restricts the efficiency of LLMs in aligning with specific legal statutes. To address this challenge, we introduce a novel framework, GoldCoin, designed to efficiently ground LLMs in privacy laws for judicial assessing privacy violations. Our framework leverages the theory of contextual integrity as a bridge, creating numerous synthetic scenarios grounded in relevant privacy statutes (e.g., HIPAA), to assist LLMs in comprehending the complex contexts for identifying privacy risks in the real world. Extensive experimental results demonstrate that GoldCoin markedly enhances LLMs' capabilities in recognizing privacy risks across real court cases, surpassing the baselines on different judicial tasks.

Related papers

LLM-as-a-Judge for Privacy Evaluation? Exploring the Alignment of Human and LLM Perceptions of Privacy in Textual Data [47.76073133338117]
Despite advances in the field of privacy- Natural Language Processing (NLP), the accurate evaluation of privacy remains a challenge.<n>We present a global approach inspired by sox2013$ a model for privacy evaluation in textual data.<n>Our findings pave the way for exploring the feasibility of evaluators as privacy evaluators.
arXiv Detail & Related papers (2025-08-16T20:49:41Z)
MAGPIE: A dataset for Multi-AGent contextual PrIvacy Evaluation [54.410825977390274]
Existing benchmarks to evaluate contextual privacy in LLM-agents primarily assess single-turn, low-complexity tasks.<n>We first present a benchmark - MAGPIE comprising 158 real-life high-stakes scenarios across 15 domains.<n>We then evaluate the current state-of-the-art LLMs on their understanding of contextually private data and their ability to collaborate without violating user privacy.
arXiv Detail & Related papers (2025-06-25T18:04:25Z)
PrivaCI-Bench: Evaluating Privacy with Contextual Integrity and Legal Compliance [44.287734754038254]
We present PrivaCI-Bench, a contextual privacy evaluation benchmark for generative large language models (LLMs) We evaluate the latest LLMs, including the recent reasoner models QwQ-32B and Deepseek R1. Our experimental results suggest that though LLMs can effectively capture key CI parameters inside a given context, they still require further advancements for privacy compliance.
arXiv Detail & Related papers (2025-02-24T10:49:34Z)
Multi-P$^2$A: A Multi-perspective Benchmark on Privacy Assessment for Large Vision-Language Models [65.2761254581209]
We evaluate the privacy preservation capabilities of 21 open-source and 2 closed-source Large Vision-Language Models (LVLMs) Based on Multi-P$2$A, we evaluate the privacy preservation capabilities of 21 open-source and 2 closed-source LVLMs. Our results reveal that current LVLMs generally pose a high risk of facilitating privacy breaches.
arXiv Detail & Related papers (2024-12-27T07:33:39Z)
Privacy-Preserving Large Language Models: Mechanisms, Applications, and Future Directions [0.0]
This survey explores the landscape of privacy-preserving mechanisms tailored for large language models. We examine their efficacy in addressing key privacy challenges, such as membership inference and model inversion attacks. By synthesizing state-of-the-art approaches and future trends, this paper provides a foundation for developing robust, privacy-preserving large language models.
arXiv Detail & Related papers (2024-12-09T00:24:09Z)
How Privacy-Savvy Are Large Language Models? A Case Study on Compliance and Privacy Technical Review [15.15468770348023]
We evaluate large language models' performance in privacy-related tasks such as privacy information extraction (PIE), legal and regulatory key point detection (KPD), and question answering (QA) Through an empirical assessment, we investigate the capacity of several prominent LLMs, including BERT, GPT-3.5, GPT-4, and custom models, in executing privacy compliance checks and technical privacy reviews. While LLMs show promise in automating privacy reviews and identifying regulatory discrepancies, significant gaps persist in their ability to fully comply with evolving legal standards.
arXiv Detail & Related papers (2024-09-04T01:51:37Z)
PrivacyLens: Evaluating Privacy Norm Awareness of Language Models in Action [54.11479432110771]
PrivacyLens is a novel framework designed to extend privacy-sensitive seeds into expressive vignettes and further into agent trajectories. We instantiate PrivacyLens with a collection of privacy norms grounded in privacy literature and crowdsourced seeds. State-of-the-art LMs, like GPT-4 and Llama-3-70B, leak sensitive information in 25.68% and 38.69% of cases, even when prompted with privacy-enhancing instructions.
arXiv Detail & Related papers (2024-08-29T17:58:38Z)
LLM-PBE: Assessing Data Privacy in Large Language Models [111.58198436835036]
Large Language Models (LLMs) have become integral to numerous domains, significantly advancing applications in data management, mining, and analysis. Despite the critical nature of this issue, there has been no existing literature to offer a comprehensive assessment of data privacy risks in LLMs. Our paper introduces LLM-PBE, a toolkit crafted specifically for the systematic evaluation of data privacy risks in LLMs.
arXiv Detail & Related papers (2024-08-23T01:37:29Z)
Privacy Checklist: Privacy Violation Detection Grounding on Contextual Integrity Theory [43.12744258781724]
We formulate the privacy issue as a reasoning problem rather than simple pattern matching. We develop the first comprehensive checklist that covers social identities, private attributes, and existing privacy regulations.
arXiv Detail & Related papers (2024-08-19T14:48:04Z)
Centering Policy and Practice: Research Gaps around Usable Differential Privacy [12.340264479496375]
We argue that while differential privacy is a clean formulation in theory, it poses significant challenges in practice. To bridge the gaps between differential privacy's promises and its real-world usability, researchers and practitioners must work together.
arXiv Detail & Related papers (2024-06-17T21:32:30Z)
No Free Lunch Theorem for Privacy-Preserving LLM Inference [30.554456047738295]
This study develops a framework for inferring privacy-protected Large Language Models (LLMs) It lays down a solid theoretical basis for examining the interplay between privacy preservation and utility.
arXiv Detail & Related papers (2024-05-31T08:22:53Z)
Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Contextual Integrity Theory [82.7042006247124]
We show that even the most capable AI models reveal private information in contexts that humans would not, 39% and 57% of the time, respectively. Our work underscores the immediate need to explore novel inference-time privacy-preserving approaches, based on reasoning and theory of mind.
arXiv Detail & Related papers (2023-10-27T04:15:30Z)
PrivacyMind: Large Language Models Can Be Contextual Privacy Protection Learners [81.571305826793]
We introduce Contextual Privacy Protection Language Models (PrivacyMind) Our work offers a theoretical analysis for model design and benchmarks various techniques. In particular, instruction tuning with both positive and negative examples stands out as a promising method.
arXiv Detail & Related papers (2023-10-03T22:37:01Z)
A Unified View of Differentially Private Deep Generative Modeling [60.72161965018005]
Data with privacy concerns comes with stringent regulations that frequently prohibited data access and data sharing. Overcoming these obstacles is key for technological progress in many real-world application scenarios that involve privacy sensitive data. Differentially private (DP) data publishing provides a compelling solution, where only a sanitized form of the data is publicly released.
arXiv Detail & Related papers (2023-09-27T14:38:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.