Unsupervised Text Deidentification
- URL: http://arxiv.org/abs/2210.11528v1
- Date: Thu, 20 Oct 2022 18:54:39 GMT
- Title: Unsupervised Text Deidentification
- Authors: John X. Morris, Justin T. Chiu, Ramin Zabih, Alexander M. Rush
- Abstract summary: We propose an unsupervised deidentification method that masks words that leak personally-identifying information.
Motivated by K-anonymity based privacy, we generate redactions that ensure a minimum reidentification rank.
- Score: 101.2219634341714
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deidentification seeks to anonymize textual data prior to distribution.
Automatic deidentification primarily uses supervised named entity recognition
from human-labeled data points. We propose an unsupervised deidentification
method that masks words that leak personally-identifying information. The
approach utilizes a specially trained reidentification model to identify
individuals from redacted personal documents. Motivated by K-anonymity based
privacy, we generate redactions that ensure a minimum reidentification rank for
the correct profile of the document. To evaluate this approach, we consider the
task of deidentifying Wikipedia Biographies, and evaluate using an adversarial
reidentification metric. Compared to a set of unsupervised baselines, our
approach deidentifies documents more completely while removing fewer words.
Qualitatively, we see that the approach eliminates many identifying aspects
that would fall outside of the common named entity based approach.
Related papers
- Keypoint Promptable Re-Identification [76.31113049256375]
Occluded Person Re-Identification (ReID) is a metric learning task that involves matching occluded individuals based on their appearance.
We introduce Keypoint Promptable ReID (KPR), a novel formulation of the ReID problem that explicitly complements the input bounding box with a set of semantic keypoints.
We release custom keypoint labels for four popular ReID benchmarks. Experiments on person retrieval, but also on pose tracking, demonstrate that our method systematically surpasses previous state-of-the-art approaches.
arXiv Detail & Related papers (2024-07-25T15:20:58Z) - Multiview Identifiers Enhanced Generative Retrieval [78.38443356800848]
generative retrieval generates identifier strings of passages as the retrieval target.
We propose a new type of identifier, synthetic identifiers, that are generated based on the content of a passage.
Our proposed approach performs the best in generative retrieval, demonstrating its effectiveness and robustness.
arXiv Detail & Related papers (2023-05-26T06:50:21Z) - Towards Privacy-Preserving Person Re-identification via Person Identify
Shift [19.212691296927165]
Person re-identification (ReID) requires preserving the privacy of pedestrian images used by ReID methods.
We propose a novel de-identification method designed explicitly for person ReID, named Person Identify Shift (PIS)
PIS shifts each pedestrian image from the current identity to another with a new identity, resulting in images still preserving the relative identities.
arXiv Detail & Related papers (2022-07-15T06:58:41Z) - The Text Anonymization Benchmark (TAB): A Dedicated Corpus and
Evaluation Framework for Text Anonymization [2.9849405664643585]
We present a novel benchmark and associated evaluation metrics for assessing the performance of text anonymization methods.
Text anonymization, defined as the task of editing a text document to prevent the disclosure of personal information, currently suffers from a shortage of privacy-oriented annotated text resources.
This paper presents TAB (Text Anonymization Benchmark), a new, open-source annotated corpus developed to address this shortage.
arXiv Detail & Related papers (2022-01-25T14:34:42Z) - RealGait: Gait Recognition for Person Re-Identification [79.67088297584762]
We construct a new gait dataset by extracting silhouettes from an existing video person re-identification challenge which consists of 1,404 persons walking in an unconstrained manner.
Our results suggest that recognizing people by their gait in real surveillance scenarios is feasible and the underlying gait pattern is probably the true reason why video person re-idenfification works in practice.
arXiv Detail & Related papers (2022-01-13T06:30:56Z) - Context-Aware Unsupervised Clustering for Person Search [13.99348653165494]
We introduce a novel framework of person search that is able to train the network in the absence of the person identity labels.
We propose efficient unsupervised clustering methods to substitute the supervision process using annotated person identity labels.
The experimental results show that the proposed method achieves comparable performance to that of the state-of-the-art supervised person search methods.
arXiv Detail & Related papers (2021-10-04T11:39:18Z) - No Intruder, no Validity: Evaluation Criteria for Privacy-Preserving
Text Anonymization [0.48733623015338234]
We argue that researchers and practitioners developing automated text anonymization systems should carefully assess whether their evaluation methods truly reflect the system's ability to protect individuals from being re-identified.
We propose TILD, a set of evaluation criteria that comprises an anonymization method's technical performance, the information loss resulting from its anonymization, and the human ability to de-anonymize redacted documents.
arXiv Detail & Related papers (2021-03-16T18:18:29Z) - Identity-Driven DeepFake Detection [91.0504621868628]
Identity-Driven DeepFake Detection takes as input the suspect image/video as well as the target identity information.
We output a decision on whether the identity in the suspect image/video is the same as the target identity.
We present a simple identity-based detection algorithm called the OuterFace, which may serve as a baseline for further research.
arXiv Detail & Related papers (2020-12-07T18:59:08Z) - How important are faces for person re-identification? [14.718372669984364]
We apply a face detection and blurring algorithm to create anonymized versions of several popular person re-identification datasets.
We evaluate the effect of this anonymization on re-identification performance using standard metrics.
arXiv Detail & Related papers (2020-10-13T11:47:16Z) - Learning Person Re-identification Models from Videos with Weak
Supervision [53.53606308822736]
We introduce the problem of learning person re-identification models from videos with weak supervision.
We propose a multiple instance attention learning framework for person re-identification using such video-level labels.
The attention weights are obtained based on all person images instead of person tracklets in a video, making our learned model less affected by noisy annotations.
arXiv Detail & Related papers (2020-07-21T07:23:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.