A Girl Has A Name, And It's ... Adversarial Authorship Attribution for
Deobfuscation
- URL: http://arxiv.org/abs/2203.11849v1
- Date: Tue, 22 Mar 2022 16:26:09 GMT
- Title: A Girl Has A Name, And It's ... Adversarial Authorship Attribution for
Deobfuscation
- Authors: Wanyue Zhai, Jonathan Rusert, Zubair Shafiq, Padmini Srinivasan
- Abstract summary: We show that adversarially trained authorship attributors are able to degrade the effectiveness of existing obfuscators.
Our results underline the need for stronger obfuscation approaches that are resistant to deobfuscation.
- Score: 9.558392439655014
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in natural language processing have enabled powerful
privacy-invasive authorship attribution. To counter authorship attribution,
researchers have proposed a variety of rule-based and learning-based text
obfuscation approaches. However, existing authorship obfuscation approaches do
not consider the adversarial threat model. Specifically, they are not evaluated
against adversarially trained authorship attributors that are aware of
potential obfuscation. To fill this gap, we investigate the problem of
adversarial authorship attribution for deobfuscation. We show that
adversarially trained authorship attributors are able to degrade the
effectiveness of existing obfuscators from 20-30% to 5-10%. We also evaluate
the effectiveness of adversarial training when the attributor makes incorrect
assumptions about whether and which obfuscator was used. While there is a a
clear degradation in attribution accuracy, it is noteworthy that this
degradation is still at or above the attribution accuracy of the attributor
that is not adversarially trained at all. Our results underline the need for
stronger obfuscation approaches that are resistant to deobfuscation
Related papers
- TAROT: Task-Oriented Authorship Obfuscation Using Policy Optimization Methods [5.239989658197324]
Authorship obfuscation aims to disguise the identity of an author within a text.
This alteration needs to balance privacy and utility.
We propose TAROT: Task-Oriented Authorship Obfuscation Using Policy Optimization.
arXiv Detail & Related papers (2024-07-31T14:24:01Z) - Improving Adversarial Robustness via Decoupled Visual Representation Masking [65.73203518658224]
In this paper, we highlight two novel properties of robust features from the feature distribution perspective.
We find that state-of-the-art defense methods aim to address both of these mentioned issues well.
Specifically, we propose a simple but effective defense based on decoupled visual representation masking.
arXiv Detail & Related papers (2024-06-16T13:29:41Z) - Few-Shot Adversarial Prompt Learning on Vision-Language Models [62.50622628004134]
The vulnerability of deep neural networks to imperceptible adversarial perturbations has attracted widespread attention.
Previous efforts achieved zero-shot adversarial robustness by aligning adversarial visual features with text supervision.
We propose a few-shot adversarial prompt framework where adapting input sequences with limited data makes significant adversarial robustness improvement.
arXiv Detail & Related papers (2024-03-21T18:28:43Z) - JAMDEC: Unsupervised Authorship Obfuscation using Constrained Decoding
over Small Language Models [53.83273575102087]
We propose an unsupervised inference-time approach to authorship obfuscation.
We introduce JAMDEC, a user-controlled, inference-time algorithm for authorship obfuscation.
Our approach builds on small language models such as GPT2-XL in order to help avoid disclosing the original content to proprietary LLM's APIs.
arXiv Detail & Related papers (2024-02-13T19:54:29Z) - UID as a Guiding Metric for Automated Authorship Obfuscation [0.0]
Automated authorship attributors are capable of attributing the author of a text amongst a pool of authors with great accuracy.
In order to counter the rise of these automated attributors, there has also been a rise of automated obfuscators.
We devised three novel authorship obfuscation methods that utilize a Psycho-linguistic theory known as Uniform Information Density (UID) theory.
arXiv Detail & Related papers (2023-11-05T22:16:37Z) - Gradient Obfuscation Checklist Test Gives a False Sense of Security [85.8719866710494]
Main source of robustness of such defenses is often due to the obfuscation of the gradients, offering a false sense of security.
Five characteristics have been identified, which are commonly observed when the improvement in robustness is mainly caused by gradient obfuscation.
It has since become a trend to use these five characteristics as a sufficient test, to determine whether or not gradient obfuscation is the main source of robustness.
arXiv Detail & Related papers (2022-06-03T17:27:10Z) - Avengers Ensemble! Improving Transferability of Authorship Obfuscation [7.962140902232626]
Stylometric approaches have been shown to be quite effective for real-world authorship attribution.
We propose an ensemble-based approach for transferable authorship obfuscation.
arXiv Detail & Related papers (2021-09-15T00:11:40Z) - Adversarial Visual Robustness by Causal Intervention [56.766342028800445]
Adversarial training is the de facto most promising defense against adversarial examples.
Yet, its passive nature inevitably prevents it from being immune to unknown attackers.
We provide a causal viewpoint of adversarial vulnerability: the cause is the confounder ubiquitously existing in learning.
arXiv Detail & Related papers (2021-06-17T14:23:54Z) - Robust and Accurate Authorship Attribution via Program Normalization [24.381734600088453]
Source code attribution approaches have achieved remarkable accuracy thanks to the rapid advances in deep learning.
In particular, they can be easily deceived by adversaries who attempt to either create a forgery of another author or to mask the original author.
We present a novel learning framework, $textitnormalize-and-predict$ ($textitN&P$), that in theory guarantees the robustness of any authorship-attribution approach.
arXiv Detail & Related papers (2020-07-01T21:27:38Z) - Proper Network Interpretability Helps Adversarial Robustness in
Classification [91.39031895064223]
We show that with a proper measurement of interpretation, it is difficult to prevent prediction-evasion adversarial attacks from causing interpretation discrepancy.
We develop an interpretability-aware defensive scheme built only on promoting robust interpretation.
We show that our defense achieves both robust classification and robust interpretation, outperforming state-of-the-art adversarial training methods against attacks of large perturbation.
arXiv Detail & Related papers (2020-06-26T01:31:31Z) - A Girl Has A Name: Detecting Authorship Obfuscation [12.461503242570643]
Authorship attribution aims to identify the author of a text based on the stylometric analysis.
Authorship obfuscation aims to protect against authorship attribution by modifying a text's style.
We evaluate the stealthiness of state-of-the-art authorship obfuscation methods under an adversarial threat model.
arXiv Detail & Related papers (2020-05-02T04:52:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.