ALISON: Fast and Effective Stylometric Authorship Obfuscation
- URL: http://arxiv.org/abs/2402.00835v1
- Date: Thu, 1 Feb 2024 18:22:32 GMT
- Title: ALISON: Fast and Effective Stylometric Authorship Obfuscation
- Authors: Eric Xing, Saranya Venkatraman, Thai Le, Dongwon Lee
- Abstract summary: Authorship Attribution (AA) and Authorship Obfuscation (AO) are two competing tasks of increasing importance in privacy research.
We propose a practical AO method, ALISON, that dramatically reduces training/obfuscation time.
We also demonstrate that ALISON can effectively prevent four SOTA AA methods from accurately determining the authorship of ChatGPT-generated texts.
- Score: 14.297046770461264
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Authorship Attribution (AA) and Authorship Obfuscation (AO) are two competing
tasks of increasing importance in privacy research. Modern AA leverages an
author's consistent writing style to match a text to its author using an AA
classifier. AO is the corresponding adversarial task, aiming to modify a text
in such a way that its semantics are preserved, yet an AA model cannot
correctly infer its authorship. To address privacy concerns raised by
state-of-the-art (SOTA) AA methods, new AO methods have been proposed but
remain largely impractical to use due to their prohibitively slow training and
obfuscation speed, often taking hours. To this challenge, we propose a
practical AO method, ALISON, that (1) dramatically reduces training/obfuscation
time, demonstrating more than 10x faster obfuscation than SOTA AO methods, (2)
achieves better obfuscation success through attacking three transformer-based
AA methods on two benchmark datasets, typically performing 15% better than
competing methods, (3) does not require direct signals from a target AA
classifier during obfuscation, and (4) utilizes unique stylometric features,
allowing sound model interpretation for explainable obfuscation. We also
demonstrate that ALISON can effectively prevent four SOTA AA methods from
accurately determining the authorship of ChatGPT-generated texts, all while
minimally changing the original text semantics. To ensure the reproducibility
of our findings, our code and data are available at:
https://github.com/EricX003/ALISON.
Related papers
- TAROT: Task-Oriented Authorship Obfuscation Using Policy Optimization Methods [5.239989658197324]
Authorship obfuscation aims to disguise the identity of an author within a text.
This alteration needs to balance privacy and utility.
We propose TAROT: Task-Oriented Authorship Obfuscation Using Policy Optimization.
arXiv Detail & Related papers (2024-07-31T14:24:01Z) - Forging the Forger: An Attempt to Improve Authorship Verification via Data Augmentation [52.72682366640554]
Authorship Verification (AV) is a text classification task concerned with inferring whether a candidate text has been written by one specific author or by someone else.
It has been shown that many AV systems are vulnerable to adversarial attacks, where a malicious author actively tries to fool the classifier by either concealing their writing style, or by imitating the style of another author.
arXiv Detail & Related papers (2024-03-17T16:36:26Z) - JAMDEC: Unsupervised Authorship Obfuscation using Constrained Decoding
over Small Language Models [53.83273575102087]
We propose an unsupervised inference-time approach to authorship obfuscation.
We introduce JAMDEC, a user-controlled, inference-time algorithm for authorship obfuscation.
Our approach builds on small language models such as GPT2-XL in order to help avoid disclosing the original content to proprietary LLM's APIs.
arXiv Detail & Related papers (2024-02-13T19:54:29Z) - UPTON: Preventing Authorship Leakage from Public Text Release via Data
Poisoning [17.956089294338984]
We present a novel solution, UPTON, that exploits black-box data poisoning methods to weaken the authorship features in training samples.
We present empirical validation where UPTON successfully downgrades the accuracy of AA models to the impractical level.
UPTON remains effective to AA models that are already trained on available clean writings of authors.
arXiv Detail & Related papers (2022-11-17T17:49:57Z) - Avengers Ensemble! Improving Transferability of Authorship Obfuscation [7.962140902232626]
Stylometric approaches have been shown to be quite effective for real-world authorship attribution.
We propose an ensemble-based approach for transferable authorship obfuscation.
arXiv Detail & Related papers (2021-09-15T00:11:40Z) - Semantic-Preserving Adversarial Text Attacks [85.32186121859321]
We propose a Bigram and Unigram based adaptive Semantic Preservation Optimization (BU-SPO) method to examine the vulnerability of deep models.
Our method achieves the highest attack success rates and semantics rates by changing the smallest number of words compared with existing methods.
arXiv Detail & Related papers (2021-08-23T09:05:18Z) - Transferable Sparse Adversarial Attack [62.134905824604104]
We introduce a generator architecture to alleviate the overfitting issue and thus efficiently craft transferable sparse adversarial examples.
Our method achieves superior inference speed, 700$times$ faster than other optimization-based methods.
arXiv Detail & Related papers (2021-05-31T06:44:58Z) - Towards Variable-Length Textual Adversarial Attacks [68.27995111870712]
It is non-trivial to conduct textual adversarial attacks on natural language processing tasks due to the discreteness of data.
In this paper, we propose variable-length textual adversarial attacks(VL-Attack)
Our method can achieve $33.18$ BLEU score on IWSLT14 German-English translation, achieving an improvement of $1.47$ over the baseline model.
arXiv Detail & Related papers (2021-04-16T14:37:27Z) - DeepStyle: User Style Embedding for Authorship Attribution of Short
Texts [57.503904346336384]
Authorship attribution (AA) is an important and widely studied research topic with many applications.
Recent works have shown that deep learning methods could achieve significant accuracy improvement for the AA task.
We propose DeepStyle, a novel embedding-based framework that learns the representations of users' salient writing styles.
arXiv Detail & Related papers (2021-03-14T15:56:37Z) - A Girl Has A Name: Detecting Authorship Obfuscation [12.461503242570643]
Authorship attribution aims to identify the author of a text based on the stylometric analysis.
Authorship obfuscation aims to protect against authorship attribution by modifying a text's style.
We evaluate the stealthiness of state-of-the-art authorship obfuscation methods under an adversarial threat model.
arXiv Detail & Related papers (2020-05-02T04:52:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.