No Intruder, no Validity: Evaluation Criteria for Privacy-Preserving
Text Anonymization
- URL: http://arxiv.org/abs/2103.09263v1
- Date: Tue, 16 Mar 2021 18:18:29 GMT
- Title: No Intruder, no Validity: Evaluation Criteria for Privacy-Preserving
Text Anonymization
- Authors: Maximilian Mozes, Bennett Kleinberg
- Abstract summary: We argue that researchers and practitioners developing automated text anonymization systems should carefully assess whether their evaluation methods truly reflect the system's ability to protect individuals from being re-identified.
We propose TILD, a set of evaluation criteria that comprises an anonymization method's technical performance, the information loss resulting from its anonymization, and the human ability to de-anonymize redacted documents.
- Score: 0.48733623015338234
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: For sensitive text data to be shared among NLP researchers and practitioners,
shared documents need to comply with data protection and privacy laws. There is
hence a growing interest in automated approaches for text anonymization.
However, measuring such methods' performance is challenging: missing a single
identifying attribute can reveal an individual's identity. In this paper, we
draw attention to this problem and argue that researchers and practitioners
developing automated text anonymization systems should carefully assess whether
their evaluation methods truly reflect the system's ability to protect
individuals from being re-identified. We then propose TILD, a set of evaluation
criteria that comprises an anonymization method's technical performance, the
information loss resulting from its anonymization, and the human ability to
de-anonymize redacted documents. These criteria may facilitate progress towards
a standardized way for measuring anonymization performance.
Related papers
- Con-ReCall: Detecting Pre-training Data in LLMs via Contrastive Decoding [118.75567341513897]
Existing methods typically analyze target text in isolation or solely with non-member contexts.
We propose Con-ReCall, a novel approach that leverages the asymmetric distributional shifts induced by member and non-member contexts.
arXiv Detail & Related papers (2024-09-05T09:10:38Z) - Robust Utility-Preserving Text Anonymization Based on Large Language Models [80.5266278002083]
Text anonymization is crucial for sharing sensitive data while maintaining privacy.
Existing techniques face the emerging challenges of re-identification attack ability of Large Language Models.
This paper proposes a framework composed of three LLM-based components -- a privacy evaluator, a utility evaluator, and an optimization component.
arXiv Detail & Related papers (2024-07-16T14:28:56Z) - Anonymization Prompt Learning for Facial Privacy-Preserving Text-to-Image Generation [56.46932751058042]
We train a learnable prompt prefix for text-to-image diffusion models, which forces the model to generate anonymized facial identities.
Experiments demonstrate the successful anonymization performance of APL, which anonymizes any specific individuals without compromising the quality of non-identity-specific image generation.
arXiv Detail & Related papers (2024-05-27T07:38:26Z) - Large Language Models are Advanced Anonymizers [13.900633576526863]
We show how adversarial anonymization outperforms current industry-grade anonymizers in terms of the resulting utility and privacy.
We first present a new setting for evaluating anonymizations in the face of adversarial LLMs inferences.
arXiv Detail & Related papers (2024-02-21T14:44:00Z) - Diff-Privacy: Diffusion-based Face Privacy Protection [58.1021066224765]
In this paper, we propose a novel face privacy protection method based on diffusion models, dubbed Diff-Privacy.
Specifically, we train our proposed multi-scale image inversion module (MSI) to obtain a set of SDM format conditional embeddings of the original image.
Based on the conditional embeddings, we design corresponding embedding scheduling strategies and construct different energy functions during the denoising process to achieve anonymization and visual identity information hiding.
arXiv Detail & Related papers (2023-09-11T09:26:07Z) - A False Sense of Privacy: Towards a Reliable Evaluation Methodology for the Anonymization of Biometric Data [8.799600976940678]
Biometric data contains distinctive human traits such as facial features or gait patterns.
Privacy protection is extensively afforded by the technique of anonymization.
We assess the state-of-the-art methods used to evaluate the performance of anonymization.
arXiv Detail & Related papers (2023-04-04T08:46:14Z) - Unsupervised Text Deidentification [101.2219634341714]
We propose an unsupervised deidentification method that masks words that leak personally-identifying information.
Motivated by K-anonymity based privacy, we generate redactions that ensure a minimum reidentification rank.
arXiv Detail & Related papers (2022-10-20T18:54:39Z) - A Dataset on Malicious Paper Bidding in Peer Review [84.68308372858755]
Malicious reviewers strategically bid in order to unethically manipulate the paper assignment.
A critical impediment towards creating and evaluating methods to mitigate this issue is the lack of publicly-available data on malicious paper bidding.
We release a novel dataset, collected from a mock conference activity where participants were instructed to bid either honestly or maliciously.
arXiv Detail & Related papers (2022-06-24T20:23:33Z) - Statistical anonymity: Quantifying reidentification risks without
reidentifying users [4.103598036312231]
Data anonymization is an approach to privacy-preserving data release aimed at preventing participants reidentification.
Existing algorithms for enforcing $k$-anonymity in the released data assume that the curator performing the anonymization has complete access to the original data.
This paper explores ideas for reducing the trust that must be placed in the curator, while still maintaining a statistical notion of $k$-anonymity.
arXiv Detail & Related papers (2022-01-28T18:12:44Z) - The Text Anonymization Benchmark (TAB): A Dedicated Corpus and
Evaluation Framework for Text Anonymization [2.9849405664643585]
We present a novel benchmark and associated evaluation metrics for assessing the performance of text anonymization methods.
Text anonymization, defined as the task of editing a text document to prevent the disclosure of personal information, currently suffers from a shortage of privacy-oriented annotated text resources.
This paper presents TAB (Text Anonymization Benchmark), a new, open-source annotated corpus developed to address this shortage.
arXiv Detail & Related papers (2022-01-25T14:34:42Z) - Protecting Anonymous Speech: A Generative Adversarial Network
Methodology for Removing Stylistic Indicators in Text [2.9005223064604078]
We develop a new approach to authorship anonymization by constructing a generative adversarial network.
Our fully automatic method achieves comparable results to other methods in terms of content preservation and fluency.
Our approach is able to generalize well to an open-set context and anonymize sentences from authors it has not encountered before.
arXiv Detail & Related papers (2021-10-18T17:45:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.