Defending Against Authorship Identification Attacks
- URL: http://arxiv.org/abs/2310.01568v1
- Date: Mon, 2 Oct 2023 19:03:11 GMT
- Title: Defending Against Authorship Identification Attacks
- Authors: Haining Wang
- Abstract summary: Authorship identification has proven unsettlingly effective in inferring the identity of the author of an unsigned document.
The presented work offers a comprehensive review of the advancements in this research area spanning over the past two decades and beyond.
- Score: 9.148691357200216
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Authorship identification has proven unsettlingly effective in inferring the
identity of the author of an unsigned document, even when sensitive personal
information has been carefully omitted. In the digital era, individuals leave a
lasting digital footprint through their written content, whether it is posted
on social media, stored on their employer's computers, or located elsewhere.
When individuals need to communicate publicly yet wish to remain anonymous,
there is little available to protect them from unwanted authorship
identification. This unprecedented threat to privacy is evident in scenarios
such as whistle-blowing. Proposed defenses against authorship identification
attacks primarily aim to obfuscate one's writing style, thereby making it
unlinkable to their pre-existing writing, while concurrently preserving the
original meaning and grammatical integrity. The presented work offers a
comprehensive review of the advancements in this research area spanning over
the past two decades and beyond. It emphasizes the methodological frameworks of
modification and generation-based strategies devised to evade authorship
identification attacks, highlighting joint efforts from the differential
privacy community. Limitations of current research are discussed, with a
spotlight on open challenges and potential research avenues.
Related papers
- RedactBuster: Entity Type Recognition from Redacted Documents [13.172863061928899]
We propose RedactBuster, the first deanonymization model using sentence context to perform Named Entity Recognition on reacted text.
We test RedactBuster against the most effective redaction technique and evaluate it using the publicly available Text Anonymization Benchmark (TAB)
Our results show accuracy values up to 0.985 regardless of the document nature or entity type.
arXiv Detail & Related papers (2024-04-19T16:42:44Z) - Privacy-preserving Optics for Enhancing Protection in Face De-identification [60.110274007388135]
We propose a hardware-level face de-identification method to solve this vulnerability.
We also propose an anonymization framework that generates a new face using the privacy-preserving image, face heatmap, and a reference face image from a public dataset as input.
arXiv Detail & Related papers (2024-03-31T19:28:04Z) - JAMDEC: Unsupervised Authorship Obfuscation using Constrained Decoding
over Small Language Models [53.83273575102087]
We propose an unsupervised inference-time approach to authorship obfuscation.
We introduce JAMDEC, a user-controlled, inference-time algorithm for authorship obfuscation.
Our approach builds on small language models such as GPT2-XL in order to help avoid disclosing the original content to proprietary LLM's APIs.
arXiv Detail & Related papers (2024-02-13T19:54:29Z) - Diff-Privacy: Diffusion-based Face Privacy Protection [58.1021066224765]
In this paper, we propose a novel face privacy protection method based on diffusion models, dubbed Diff-Privacy.
Specifically, we train our proposed multi-scale image inversion module (MSI) to obtain a set of SDM format conditional embeddings of the original image.
Based on the conditional embeddings, we design corresponding embedding scheduling strategies and construct different energy functions during the denoising process to achieve anonymization and visual identity information hiding.
arXiv Detail & Related papers (2023-09-11T09:26:07Z) - User-Centered Security in Natural Language Processing [0.7106986689736825]
dissertation proposes a framework of user-centered security in Natural Language Processing (NLP)
It focuses on two security domains within NLP with great public interest.
arXiv Detail & Related papers (2023-01-10T22:34:19Z) - Unsupervised Text Deidentification [101.2219634341714]
We propose an unsupervised deidentification method that masks words that leak personally-identifying information.
Motivated by K-anonymity based privacy, we generate redactions that ensure a minimum reidentification rank.
arXiv Detail & Related papers (2022-10-20T18:54:39Z) - Statistical anonymity: Quantifying reidentification risks without
reidentifying users [4.103598036312231]
Data anonymization is an approach to privacy-preserving data release aimed at preventing participants reidentification.
Existing algorithms for enforcing $k$-anonymity in the released data assume that the curator performing the anonymization has complete access to the original data.
This paper explores ideas for reducing the trust that must be placed in the curator, while still maintaining a statistical notion of $k$-anonymity.
arXiv Detail & Related papers (2022-01-28T18:12:44Z) - Protecting Anonymous Speech: A Generative Adversarial Network
Methodology for Removing Stylistic Indicators in Text [2.9005223064604078]
We develop a new approach to authorship anonymization by constructing a generative adversarial network.
Our fully automatic method achieves comparable results to other methods in terms of content preservation and fluency.
Our approach is able to generalize well to an open-set context and anonymize sentences from authors it has not encountered before.
arXiv Detail & Related papers (2021-10-18T17:45:56Z) - No Intruder, no Validity: Evaluation Criteria for Privacy-Preserving
Text Anonymization [0.48733623015338234]
We argue that researchers and practitioners developing automated text anonymization systems should carefully assess whether their evaluation methods truly reflect the system's ability to protect individuals from being re-identified.
We propose TILD, a set of evaluation criteria that comprises an anonymization method's technical performance, the information loss resulting from its anonymization, and the human ability to de-anonymize redacted documents.
arXiv Detail & Related papers (2021-03-16T18:18:29Z) - Detecting Cross-Modal Inconsistency to Defend Against Neural Fake News [57.9843300852526]
We introduce the more realistic and challenging task of defending against machine-generated news that also includes images and captions.
To identify the possible weaknesses that adversaries can exploit, we create a NeuralNews dataset composed of 4 different types of generated articles.
In addition to the valuable insights gleaned from our user study experiments, we provide a relatively effective approach based on detecting visual-semantic inconsistencies.
arXiv Detail & Related papers (2020-09-16T14:13:15Z) - Towards Face Encryption by Generating Adversarial Identity Masks [53.82211571716117]
We propose a targeted identity-protection iterative method (TIP-IM) to generate adversarial identity masks.
TIP-IM provides 95%+ protection success rate against various state-of-the-art face recognition models.
arXiv Detail & Related papers (2020-03-15T12:45:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.