Avengers Ensemble! Improving Transferability of Authorship Obfuscation
- URL: http://arxiv.org/abs/2109.07028v1
- Date: Wed, 15 Sep 2021 00:11:40 GMT
- Title: Avengers Ensemble! Improving Transferability of Authorship Obfuscation
- Authors: Muhammad Haroon, Muhammad Fareed Zaffar, Padmini Srinivasan, Zubair
Shafiq
- Abstract summary: Stylometric approaches have been shown to be quite effective for real-world authorship attribution.
We propose an ensemble-based approach for transferable authorship obfuscation.
- Score: 7.962140902232626
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Stylometric approaches have been shown to be quite effective for real-world
authorship attribution. To mitigate the privacy threat posed by authorship
attribution, researchers have proposed automated authorship obfuscation
approaches that aim to conceal the stylometric artefacts that give away the
identity of an anonymous document's author. Recent work has focused on
authorship obfuscation approaches that rely on black-box access to an
attribution classifier to evade attribution while preserving semantics.
However, to be useful under a realistic threat model, it is important that
these obfuscation approaches work well even when the adversary's attribution
classifier is different from the one used internally by the obfuscator.
Unfortunately, existing authorship obfuscation approaches do not transfer well
to unseen attribution classifiers. In this paper, we propose an ensemble-based
approach for transferable authorship obfuscation. Our experiments show that if
an obfuscator can evade an ensemble attribution classifier, which is based on
multiple base attribution classifiers, it is more likely to transfer to
different attribution classifiers. Our analysis shows that ensemble-based
authorship obfuscation achieves better transferability because it combines the
knowledge from each of the base attribution classifiers by essentially
averaging their decision boundaries.
Related papers
- Imperceptible Face Forgery Attack via Adversarial Semantic Mask [59.23247545399068]
We propose an Adversarial Semantic Mask Attack framework (ASMA) which can generate adversarial examples with good transferability and invisibility.
Specifically, we propose a novel adversarial semantic mask generative model, which can constrain generated perturbations in local semantic regions for good stealthiness.
arXiv Detail & Related papers (2024-06-16T10:38:11Z) - ZeroPur: Succinct Training-Free Adversarial Purification [52.963392510839284]
Adversarial purification is a kind of defense computation technique that can defend various unseen adversarial attacks.
We present a simple adversarial purification method without further training to purify adversarial images, called ZeroPur.
arXiv Detail & Related papers (2024-06-05T10:58:15Z) - Forging the Forger: An Attempt to Improve Authorship Verification via Data Augmentation [52.72682366640554]
Authorship Verification (AV) is a text classification task concerned with inferring whether a candidate text has been written by one specific author or by someone else.
It has been shown that many AV systems are vulnerable to adversarial attacks, where a malicious author actively tries to fool the classifier by either concealing their writing style, or by imitating the style of another author.
arXiv Detail & Related papers (2024-03-17T16:36:26Z) - ALISON: Fast and Effective Stylometric Authorship Obfuscation [14.297046770461264]
Authorship Attribution (AA) and Authorship Obfuscation (AO) are two competing tasks of increasing importance in privacy research.
We propose a practical AO method, ALISON, that dramatically reduces training/obfuscation time.
We also demonstrate that ALISON can effectively prevent four SOTA AA methods from accurately determining the authorship of ChatGPT-generated texts.
arXiv Detail & Related papers (2024-02-01T18:22:32Z) - UID as a Guiding Metric for Automated Authorship Obfuscation [0.0]
Automated authorship attributors are capable of attributing the author of a text amongst a pool of authors with great accuracy.
In order to counter the rise of these automated attributors, there has also been a rise of automated obfuscators.
We devised three novel authorship obfuscation methods that utilize a Psycho-linguistic theory known as Uniform Information Density (UID) theory.
arXiv Detail & Related papers (2023-11-05T22:16:37Z) - Improving Adversarial Robustness via Joint Classification and Multiple
Explicit Detection Classes [11.584771636861877]
We show that a provable framework can benefit by extension to networks with multiple explicit abstain classes.
We propose a regularization approach and a training method to counter this degeneracy by promoting full use of the multiple abstain classes.
arXiv Detail & Related papers (2022-10-26T01:23:33Z) - A Girl Has A Name, And It's ... Adversarial Authorship Attribution for
Deobfuscation [9.558392439655014]
We show that adversarially trained authorship attributors are able to degrade the effectiveness of existing obfuscators.
Our results underline the need for stronger obfuscation approaches that are resistant to deobfuscation.
arXiv Detail & Related papers (2022-03-22T16:26:09Z) - Detection of Adversarial Supports in Few-shot Classifiers Using Feature
Preserving Autoencoders and Self-Similarity [89.26308254637702]
We propose a detection strategy to highlight adversarial support sets.
We make use of feature preserving autoencoder filtering and also the concept of self-similarity of a support set to perform this detection.
Our method is attack-agnostic and also the first to explore detection for few-shot classifiers to the best of our knowledge.
arXiv Detail & Related papers (2020-12-09T14:13:41Z) - Poisoned classifiers are not only backdoored, they are fundamentally
broken [84.67778403778442]
Under a commonly-studied backdoor poisoning attack against classification models, an attacker adds a small trigger to a subset of the training data.
It is often assumed that the poisoned classifier is vulnerable exclusively to the adversary who possesses the trigger.
In this paper, we show empirically that this view of backdoored classifiers is incorrect.
arXiv Detail & Related papers (2020-10-18T19:42:44Z) - A Girl Has A Name: Detecting Authorship Obfuscation [12.461503242570643]
Authorship attribution aims to identify the author of a text based on the stylometric analysis.
Authorship obfuscation aims to protect against authorship attribution by modifying a text's style.
We evaluate the stealthiness of state-of-the-art authorship obfuscation methods under an adversarial threat model.
arXiv Detail & Related papers (2020-05-02T04:52:55Z) - Breaking certified defenses: Semantic adversarial examples with spoofed
robustness certificates [57.52763961195292]
We present a new attack that exploits not only the labelling function of a classifier, but also the certificate generator.
The proposed method applies large perturbations that place images far from a class boundary while maintaining the imperceptibility property of adversarial examples.
arXiv Detail & Related papers (2020-03-19T17:59:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.